SlideShare a Scribd company logo
1 of 46
Download to read offline
1 
Easy Hacks to Improve 
Writer - OOXML Interoperability 
Sushil Shinde 
sushil.shinde@synerzip.com 
LibreOffice Conference 2014, Bern in.linkedin.com/pub/sushil-shinde/18/65b/452/
2 
About Me 
● Sr. Software Developer at Synerzip Softech India 
● About 3 years of experience in C++ and OOXML 
● Active contributor to LibreOffice product and community 
● Member of TDF. 
● Love to play, watch cricket 
● Email: Sushil.shinde@synerzip.com 
● IRC: #libreoffice-dev chat : sushils_
3 
Topics 
● Interoperability 
● OOXML and ECMA-376 
● DOCX File Structure 
● Challenges during 'File Import' 
– File Crash 
– Data Loss 
● Challenges during 'File Export' 
– File Corruption 
– Data Loss 
● LibreOffice Hang Issues 
● Some Useful Tools 
● Examples
4 
Interoperability 
Many companies, 
Government Organizations, 
Individuals use MS 
Word File Formats. 
MS Word Formats: 
.doc (Binary file) 
.docx (OOXML 
File Format)
5 
OOXML and ECMA-376 
● Office Open XML (OOXML) 
– Microsoft Office 2007 and later versions (like 2010, 
2013) uses OOXML format. 
● The ECMA-376 Standard 
– This Standard defines OOXML's vocabularies and 
document representation and packaging details. 
– Specifications are freely available on the ECMA 
website.
6 
DOCX File Structure 
Docx File Package 
_rels 
docProps 
word 
_rels 
Document.xml 
header[n].xml 
footer[n].xml 
Styles.xml 
media 
themes 
[content_types].xml 
A lookup for each of the item referenced in document, 
Header, footer (e.g. images, sounds, headers, footers) 
The text of the document. Contains Links to 
Other objects retrieved via lookup. 
The text of the header, footer from 
From documents. Also contains references 
To other objects. (e.g. images used in header 
Or footer) 
charts 
Contains the definitions for a set of styles used by 
the document. 
Contains media files like image, sounds, video 
Which referenced in doument.xml(e.g. 
image1.png) 
Chart data folder. (chart[n].xml and chart[n].xml.rels) 
.. 
Contains MIME type information for parts of the package
7 
Challenges In 'File Import' 
● LibreOffice crash 
● Data loss 
● LibreOffice hangs
8 
File Import – Crash issues 
● Reasons can be- 
– Programming mistakes 
● Null pointer check 
● Memory Leaks 
– Some issues in import filters 
● Some specific combinations of data
9 
Analyzing Crash 
● Optimize File 
– Check MS Office version (2007/2010/2013) using which file is created 
– Use “Divide and conquer” method to optimize file 
– Try to optimize file upto 1-2 pages with minimum data on it 
● Identify XML part which is causing error 
● Try to Identify MS Office feature which is causing error 
– If confirmed, try to create .doc (binary version) file with same feature 
and check whether that file works 
● Locate parsing and mapping of XML elements in import filters to 
identify root cause
10 
Crash - Example 
Problematic xml area 
fdo#79973
11 
Resolving Crash - Example 
Code reference : https://gerrit.libreoffice.org/#/c/9840
12 
File Import – Types Of Data Loss 
● Feature loss (ex. Text, shapes etc) 
● Feature property loss (ex. Colors, line styles 
etc) 
● Incorrect values (ex. Shape size, position etc)
File Import – Reasons For Data Loss 
13 
● MS Office feature is not supported 
– Implement feature support 
– Grab-bag 
● XML Nodes not handled 
● XML elements not mapped properly 
● Properties lost in shape conversions 
(SwXShape → SwXTextFrame)
14 
File Import – How To Fix Data Loss 
● Check XML Schema of missing feature 
● Check ECMA 376 specs of missing properties 
● Check XML properties are available in model.xml 
● Identify LibreOffice UNO Properties for missing data 
– Insert similar feature in LibreOffice and check properties that represent 
missing effects 
– Create .doc file with same data 
– Use XRAY tool to check properties 
● Locate handling of those XML properties in dmapper 
● Check XML values are properly mapped with UNO properties 
– Hard-code UNO Properties to verify quickly
15 
Data Loss Example - shape 
● TextBox Background image loss 
Original TextBox fill 
LO rendered before FIX 
LO rendered after fix
16 
Data Loss Example - shape 
● Set proper UNO Property 
– “FillBitmapURL” property for shape 
– “BackGraphicURL” property for TextFrame 
● Handled “BackGraphicURL” property in export 
if it is textframe 
Code Reference : https://gerrit.libreoffice.org/#/c/7259
17 
Data Loss Example - Table 
Original table 
Auto width 
How LO rendered 
LO Rendering After Fix LO : Export Before Fix After Fix
18 
Data Loss Example - Table 
XML Comparison 
Original LO Exported this.. Fixed 
Code Reference : https://gerrit.libreoffice.org/#/c/7593/ 
https://gerrit.libreoffice.org/#/c/7594/
19 
Challenges In 'File Export' 
● MS Office not able to open 'saved file' 
● Data loss 
● LO crash
20 
File Export – Types Of Corruptions 
● Invalid XML values exported 
– XML values are not exported as per ECMA specs 
ECMA specs : valid 
values for rotX are 
between [-90,90]
21 
File Export – Types Of Corruptions 
● XML tag mismatch – Start and End tag not 
matching
22 
File Export – Types Of Corruptions 
● Missing target relationship entry 
● Missing relationship file (ex. header.xml.rels) 
● Exported 0 bytes file (Mostly in case of images/media folder 
contents) 
Relationship is present 
in header.xml 
But header.xml.rels file 
Is missing
23 
File Export – Types Of Corruptions 
● Invalid hierarchy 
– Text box exported inside the another textbox 
Easy 
Hack
24 
File Export – Corruption Issues 
Ms Office seems to have an internal 
limitation of 4091 styles and refuses to load 
“.docx” with more styles.
25 
Analyzing File Corruption 
● Validate exported docx file 
– Use OpenSDK tool to validate file (For windows only) 
● Compare content of exported file with original file 
– Use OOXML tool to compare file 
● Check ECMA specs of invalid XML property 
● Check relID's are exported properly 
– Relationship target is present in rels xml file 
– Check target file is available in exported file 
● Search for export part of invalid XML in export files e.g. 
docxattributeoutput, docxsdrexport etc.
File Export – Reasons For Data Loss 
26 
● Features rendered properly are mostly 
preserved in export 
● Reasons for Data loss can be- 
– Mapping of UNO Properties to OOXML properties 
● Invalid data conversion (from LO property to MSO valid 
XML value as per ECMA) 
● e.g. Rotation Angle, Dashed Borders etc 
– Required XML part is missing in exported file 
● e.g. Fill properties from shape XML Schema
27 
File Export - How To Fix Data Loss 
● Compare exported and original file 
– Verify XML schema for missing feature or properties 
of missing feature are exported 
● Check export code for missing XML part. 
– Search for xml tag “XML_elementname” e.g. 
XML_rot. In export classes. 
– Check xml parts are written under right parent 
elements.
28 
Data Loss - Example 
● Numbered list is not preserved 
– Original XML - <w:lvlText w:val="%1" /> 
– Exported XML - <w:lvlText w:val="" /> 
Numbering.xml 
Original data Before Fix After Fix 
Code reference : https://gerrit.libreoffice.org/#/c/8768/
29 
LibreOffice Hang Issues 
● LibreOffice Hangs while opening/saving docx file 
● Reasons can be - 
– Removed required UNO Properties 
● PROP_PARA_LINE_SPACING 
● Code reference : https://gerrit.libreoffice.org/#/c/9560 
– Not handled some required XML attributes 
● Code reference : https://gerrit.libreoffice.org/#/c/8632/ 
– Memory Leaks 
● Code Reference : https://gerrit.libreoffice.org/#/c/6850
30 
Some Useful Tools 
● Xray Tool 
● OOXML Tools (Chrome Browser plug-in) 
● Open XML SDK Productivity tool. (for windows)
31 
XRAY Tool
32 
OOXML Tools developed by Atul Moglewar from Synerzip. 
● Drag and drop 
● Compare two files
33 
Open SDK Tool
34 
More Examples
35 
Chart 
Wall color 
●Wall Color was missing 
From exported file 
Lost 
Fixed
36 
Chart 
Original XML for Chart Wall Color LO : Export before fix Export After Fix 
Code References : https://gerrit.libreoffice.org/7739 
https://gerrit.libreoffice.org/7792
37 
Doughnut chart 
Original chart Before fix After fix 
Code Reference : https://gerrit.libreoffice.org/#/c/6924
38 
Exploded Pie Chart 
Original chart Before fix After fix 
Code Reference : https://gerrit.libreoffice.org/#/c/6924
39 
Shapes in header 
Before Fix After Fix
40 
Fields 
Original XML Before Fix After Fix
41 
Smart Art 
Image Fills in smart are exported properly. 
Original File LO Export : Before Fix After Fix 
Code reference : https://gerrit.libreoffice.org/#/c/9121
42 
Synerzip's Contribution 
● ~250 patches submitted by synerzip in last 1 
year. 
● 50+ scenarios of crash/corruption fixed. 
● 270+ bugs filed on BugZilla. 
● 200+ bugs resolved.
43 
Team Synerzip
44 
References 
● http://cgit.freedesktop.org/libreoffice/core/log/?qt=author&q=synerzip 
● http://msdn.microsoft.com/en-us/library/office/gg607163(v=office.14).aspx 
● http://www.ecma-international.org/publications/standards/Ecma-376.htm 
● http://www.datypic.com/sc/ooxml/ 
● https://chrome.google.com/webstore/detail/ooxml-tools/bjmmjfdegplhkefakjkccocjanekbapn?● https://wiki.documentfoundation.org/Macros
45
46 
Thank You.

More Related Content

What's hot

MarcEdit - makes the life easier (BALID Training programme on Marc 21)
MarcEdit - makes the life easier (BALID Training programme on Marc 21)MarcEdit - makes the life easier (BALID Training programme on Marc 21)
MarcEdit - makes the life easier (BALID Training programme on Marc 21)Md. Zahid Hossain Shoeb
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything projectEnrico Daga
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphsandyseaborne
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEIEnrico Daga
 
Marc edit and_nonmarc_data (1)
Marc edit and_nonmarc_data (1)Marc edit and_nonmarc_data (1)
Marc edit and_nonmarc_data (1)Anoop Kumar Bajpai
 

What's hot (9)

MarcEdit - makes the life easier (BALID Training programme on Marc 21)
MarcEdit - makes the life easier (BALID Training programme on Marc 21)MarcEdit - makes the life easier (BALID Training programme on Marc 21)
MarcEdit - makes the life easier (BALID Training programme on Marc 21)
 
XML-talk
XML-talkXML-talk
XML-talk
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
Xml
XmlXml
Xml
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Dos and donts
Dos and dontsDos and donts
Dos and donts
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
Marc edit and_nonmarc_data (1)
Marc edit and_nonmarc_data (1)Marc edit and_nonmarc_data (1)
Marc edit and_nonmarc_data (1)
 

Similar to Libreoffice conference 2014 Easy hacks to improve writer - ooxml interoperability

Similar to Libreoffice conference 2014 Easy hacks to improve writer - ooxml interoperability (20)

A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
 
XML
XMLXML
XML
 
Key highlights from libreoffice conference 2014
Key highlights from libreoffice conference 2014Key highlights from libreoffice conference 2014
Key highlights from libreoffice conference 2014
 
Jdom how it works & how it opened the java process
Jdom how it works & how it opened the java processJdom how it works & how it opened the java process
Jdom how it works & how it opened the java process
 
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 
Xml
XmlXml
Xml
 
BITM3730Week5.pptx
BITM3730Week5.pptxBITM3730Week5.pptx
BITM3730Week5.pptx
 
HTML_DOM
HTML_DOMHTML_DOM
HTML_DOM
 
Unit 5 xml (1)
Unit 5   xml (1)Unit 5   xml (1)
Unit 5 xml (1)
 
XML-athon with Don and Dean
XML-athon with Don and DeanXML-athon with Don and Dean
XML-athon with Don and Dean
 
93 peter butterfield
93 peter butterfield93 peter butterfield
93 peter butterfield
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
204810 xer and xml
204810 xer and xml204810 xer and xml
204810 xer and xml
 
XML
XMLXML
XML
 
Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)
 
Xml in bio medical field
Xml in bio medical fieldXml in bio medical field
Xml in bio medical field
 
Xml and xml processor
Xml and xml processorXml and xml processor
Xml and xml processor
 
Xml and xml processor
Xml and xml processorXml and xml processor
Xml and xml processor
 

Recently uploaded

JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxJoão Esperancinha
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageDista
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 

Recently uploaded (20)

JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptx
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 
Salesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptxSalesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptx
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 

Libreoffice conference 2014 Easy hacks to improve writer - ooxml interoperability

  • 1. 1 Easy Hacks to Improve Writer - OOXML Interoperability Sushil Shinde sushil.shinde@synerzip.com LibreOffice Conference 2014, Bern in.linkedin.com/pub/sushil-shinde/18/65b/452/
  • 2. 2 About Me ● Sr. Software Developer at Synerzip Softech India ● About 3 years of experience in C++ and OOXML ● Active contributor to LibreOffice product and community ● Member of TDF. ● Love to play, watch cricket ● Email: Sushil.shinde@synerzip.com ● IRC: #libreoffice-dev chat : sushils_
  • 3. 3 Topics ● Interoperability ● OOXML and ECMA-376 ● DOCX File Structure ● Challenges during 'File Import' – File Crash – Data Loss ● Challenges during 'File Export' – File Corruption – Data Loss ● LibreOffice Hang Issues ● Some Useful Tools ● Examples
  • 4. 4 Interoperability Many companies, Government Organizations, Individuals use MS Word File Formats. MS Word Formats: .doc (Binary file) .docx (OOXML File Format)
  • 5. 5 OOXML and ECMA-376 ● Office Open XML (OOXML) – Microsoft Office 2007 and later versions (like 2010, 2013) uses OOXML format. ● The ECMA-376 Standard – This Standard defines OOXML's vocabularies and document representation and packaging details. – Specifications are freely available on the ECMA website.
  • 6. 6 DOCX File Structure Docx File Package _rels docProps word _rels Document.xml header[n].xml footer[n].xml Styles.xml media themes [content_types].xml A lookup for each of the item referenced in document, Header, footer (e.g. images, sounds, headers, footers) The text of the document. Contains Links to Other objects retrieved via lookup. The text of the header, footer from From documents. Also contains references To other objects. (e.g. images used in header Or footer) charts Contains the definitions for a set of styles used by the document. Contains media files like image, sounds, video Which referenced in doument.xml(e.g. image1.png) Chart data folder. (chart[n].xml and chart[n].xml.rels) .. Contains MIME type information for parts of the package
  • 7. 7 Challenges In 'File Import' ● LibreOffice crash ● Data loss ● LibreOffice hangs
  • 8. 8 File Import – Crash issues ● Reasons can be- – Programming mistakes ● Null pointer check ● Memory Leaks – Some issues in import filters ● Some specific combinations of data
  • 9. 9 Analyzing Crash ● Optimize File – Check MS Office version (2007/2010/2013) using which file is created – Use “Divide and conquer” method to optimize file – Try to optimize file upto 1-2 pages with minimum data on it ● Identify XML part which is causing error ● Try to Identify MS Office feature which is causing error – If confirmed, try to create .doc (binary version) file with same feature and check whether that file works ● Locate parsing and mapping of XML elements in import filters to identify root cause
  • 10. 10 Crash - Example Problematic xml area fdo#79973
  • 11. 11 Resolving Crash - Example Code reference : https://gerrit.libreoffice.org/#/c/9840
  • 12. 12 File Import – Types Of Data Loss ● Feature loss (ex. Text, shapes etc) ● Feature property loss (ex. Colors, line styles etc) ● Incorrect values (ex. Shape size, position etc)
  • 13. File Import – Reasons For Data Loss 13 ● MS Office feature is not supported – Implement feature support – Grab-bag ● XML Nodes not handled ● XML elements not mapped properly ● Properties lost in shape conversions (SwXShape → SwXTextFrame)
  • 14. 14 File Import – How To Fix Data Loss ● Check XML Schema of missing feature ● Check ECMA 376 specs of missing properties ● Check XML properties are available in model.xml ● Identify LibreOffice UNO Properties for missing data – Insert similar feature in LibreOffice and check properties that represent missing effects – Create .doc file with same data – Use XRAY tool to check properties ● Locate handling of those XML properties in dmapper ● Check XML values are properly mapped with UNO properties – Hard-code UNO Properties to verify quickly
  • 15. 15 Data Loss Example - shape ● TextBox Background image loss Original TextBox fill LO rendered before FIX LO rendered after fix
  • 16. 16 Data Loss Example - shape ● Set proper UNO Property – “FillBitmapURL” property for shape – “BackGraphicURL” property for TextFrame ● Handled “BackGraphicURL” property in export if it is textframe Code Reference : https://gerrit.libreoffice.org/#/c/7259
  • 17. 17 Data Loss Example - Table Original table Auto width How LO rendered LO Rendering After Fix LO : Export Before Fix After Fix
  • 18. 18 Data Loss Example - Table XML Comparison Original LO Exported this.. Fixed Code Reference : https://gerrit.libreoffice.org/#/c/7593/ https://gerrit.libreoffice.org/#/c/7594/
  • 19. 19 Challenges In 'File Export' ● MS Office not able to open 'saved file' ● Data loss ● LO crash
  • 20. 20 File Export – Types Of Corruptions ● Invalid XML values exported – XML values are not exported as per ECMA specs ECMA specs : valid values for rotX are between [-90,90]
  • 21. 21 File Export – Types Of Corruptions ● XML tag mismatch – Start and End tag not matching
  • 22. 22 File Export – Types Of Corruptions ● Missing target relationship entry ● Missing relationship file (ex. header.xml.rels) ● Exported 0 bytes file (Mostly in case of images/media folder contents) Relationship is present in header.xml But header.xml.rels file Is missing
  • 23. 23 File Export – Types Of Corruptions ● Invalid hierarchy – Text box exported inside the another textbox Easy Hack
  • 24. 24 File Export – Corruption Issues Ms Office seems to have an internal limitation of 4091 styles and refuses to load “.docx” with more styles.
  • 25. 25 Analyzing File Corruption ● Validate exported docx file – Use OpenSDK tool to validate file (For windows only) ● Compare content of exported file with original file – Use OOXML tool to compare file ● Check ECMA specs of invalid XML property ● Check relID's are exported properly – Relationship target is present in rels xml file – Check target file is available in exported file ● Search for export part of invalid XML in export files e.g. docxattributeoutput, docxsdrexport etc.
  • 26. File Export – Reasons For Data Loss 26 ● Features rendered properly are mostly preserved in export ● Reasons for Data loss can be- – Mapping of UNO Properties to OOXML properties ● Invalid data conversion (from LO property to MSO valid XML value as per ECMA) ● e.g. Rotation Angle, Dashed Borders etc – Required XML part is missing in exported file ● e.g. Fill properties from shape XML Schema
  • 27. 27 File Export - How To Fix Data Loss ● Compare exported and original file – Verify XML schema for missing feature or properties of missing feature are exported ● Check export code for missing XML part. – Search for xml tag “XML_elementname” e.g. XML_rot. In export classes. – Check xml parts are written under right parent elements.
  • 28. 28 Data Loss - Example ● Numbered list is not preserved – Original XML - <w:lvlText w:val="%1" /> – Exported XML - <w:lvlText w:val="" /> Numbering.xml Original data Before Fix After Fix Code reference : https://gerrit.libreoffice.org/#/c/8768/
  • 29. 29 LibreOffice Hang Issues ● LibreOffice Hangs while opening/saving docx file ● Reasons can be - – Removed required UNO Properties ● PROP_PARA_LINE_SPACING ● Code reference : https://gerrit.libreoffice.org/#/c/9560 – Not handled some required XML attributes ● Code reference : https://gerrit.libreoffice.org/#/c/8632/ – Memory Leaks ● Code Reference : https://gerrit.libreoffice.org/#/c/6850
  • 30. 30 Some Useful Tools ● Xray Tool ● OOXML Tools (Chrome Browser plug-in) ● Open XML SDK Productivity tool. (for windows)
  • 32. 32 OOXML Tools developed by Atul Moglewar from Synerzip. ● Drag and drop ● Compare two files
  • 33. 33 Open SDK Tool
  • 35. 35 Chart Wall color ●Wall Color was missing From exported file Lost Fixed
  • 36. 36 Chart Original XML for Chart Wall Color LO : Export before fix Export After Fix Code References : https://gerrit.libreoffice.org/7739 https://gerrit.libreoffice.org/7792
  • 37. 37 Doughnut chart Original chart Before fix After fix Code Reference : https://gerrit.libreoffice.org/#/c/6924
  • 38. 38 Exploded Pie Chart Original chart Before fix After fix Code Reference : https://gerrit.libreoffice.org/#/c/6924
  • 39. 39 Shapes in header Before Fix After Fix
  • 40. 40 Fields Original XML Before Fix After Fix
  • 41. 41 Smart Art Image Fills in smart are exported properly. Original File LO Export : Before Fix After Fix Code reference : https://gerrit.libreoffice.org/#/c/9121
  • 42. 42 Synerzip's Contribution ● ~250 patches submitted by synerzip in last 1 year. ● 50+ scenarios of crash/corruption fixed. ● 270+ bugs filed on BugZilla. ● 200+ bugs resolved.
  • 44. 44 References ● http://cgit.freedesktop.org/libreoffice/core/log/?qt=author&q=synerzip ● http://msdn.microsoft.com/en-us/library/office/gg607163(v=office.14).aspx ● http://www.ecma-international.org/publications/standards/Ecma-376.htm ● http://www.datypic.com/sc/ooxml/ ● https://chrome.google.com/webstore/detail/ooxml-tools/bjmmjfdegplhkefakjkccocjanekbapn?● https://wiki.documentfoundation.org/Macros
  • 45. 45