Modeling and Querying Greek Legislation using Semantic Web TechnologiesIlias Chalkidis
In this work, we study how one can make a particular kind of government data available as open data using semantic web technologies. We focus on Greek legislation and show how it can be modeled using ontologies expressed in the Web Ontology Language (OWL) and the Resource Description Framework (RDF), and queried using the expressive query language SPARQL. To demonstrate the applicability and usefulness of our approach we develop a web application, called Nomothesia, which makes Greek legislation easily accessible to the public. Nomothesia offers advanced services for retrieving and querying Greek legislation and is intended for the citizens through intuitive presentational views and search interfaces, but also for application developers for consuming its content through two web services: a SPARQL endpoint and a RESTful API.
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerRinke Hoekstra
In this paper we explore the possibilities of using the Linked Data representation of all Dutch regulations stored in the MetaLex Doc- ument Server for the purposes of network analysis over the citation graph between regulations, both at the document level, and at the article level. We show that this is possible using relatively straightforward SPARQL queries, and present preliminary results of the analysis.
A Network Analysis of Dutch Regulations. Rinke Hoekstra. figshare.
http://dx.doi.org/10.6084/m9.figshare.689880
Retrieved 11:12, Oct 07, 2013 (GMT)
The document discusses the Semantic Web, including its languages (RDF, RDFS, OWL), storage and querying using SPARQL, and methods for browsing and viewing semantic data through techniques like faceted browsing and Fresnel lenses. While the core technologies exist, broader adoption of the Semantic Web on the mainstream web still has challenges to overcome.
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
This document contains an index and summary of sections for a project on ways to prevent air pollution. The project contains 5 sections: Section A describes the concepts and search statements used; Section B summarizes 3 sources used - 2 journal articles and 1 book; Section C summarizes 6 online sources from journal articles and reports; Section D summarizes 3 journal articles; Section E lists the references used in APA style. The document provides a thorough overview and organization of the sources and information collected for a project on preventing air pollution.
Lisa Andina's document discusses the classification, nomenclature, physical properties, and reactions of alcohols and phenols. It defines primary, secondary, and tertiary alcohols based on the number of carbons bonded to the -OH group. It also discusses IUPAC nomenclature rules for naming alcohols and diols. Key physical properties described include higher boiling points due to hydrogen bonding and decreasing solubility with larger alkyl groups. Common reactions include reductions, hydrations, and conversions to alkyl halides. Phenols are then contrasted, noting their different acidity and synthesis methods compared to alcohols.
Relational databases are rigid-structured data sources characterized by complex relationships among a set of relations (tables). Making sense of such relationships is a challenging problem because users must consider multiple relations, understand their ensemble of integrity constraints, interpret dozens of attributes, and draw complex SQL queries for each desired data exploration. In this scenario, we introduce a twofold methodology; we use a hierarchical graph representation to efficiently model the database relationships and, on top of it, we designed a visualization technique for rapidly relational exploration. Our results demonstrate that the exploration of databases is profoundly simplified as the user is able to visually browse the data with little or no knowledge about its structure, dismissing the need of complex SQL queries. We believe our findings will bring a novel paradigm in what concerns relational data comprehension.
La hipertermia maligna es una reacción potencialmente mortal a los anestésicos generales que causa una acumulación anormal de calcio en las células musculares. Los síntomas incluyen rigidez muscular, aumento de la temperatura corporal, taquicardia y cambios cutáneos. El tratamiento consiste en administrar dantroleno sódico para bloquear la liberación de calcio, enfriamiento y suspender el desencadenante. Puede diagnosticarse mediante pruebas de contractura muscular o biopsia, y los pacientes deben recibir
Hipotermia dan hipertermia adalah gangguan suhu badan yang berbahaya. Hipotermia terjadi apabila suhu badan jatuh di bawah 35 derajat Celsius akibat pendedahan kepada persekitaran sejuk, manakala hipertermia berlaku apabila suhu melebihi 41 derajat disebabkan pendedahan berlebihan kepada haba atau penyakit. Kedua-dua keadaan memerlukan rawatan segera untuk menstabilkan suhu badan dan menceg
Cacar air adalah penyakit kulit yang disebabkan virus varicella dan menyebabkan demam, bercak berisi cairan di kulit, dan radang tenggorokan. Penyakit ini menular melalui percikan ludah atau benda terkontaminasi. Pencegahan meliputi kebersihan, makanan bergizi, vaksinasi, dan menghindari sumber penularan, sementara perawatannya dengan mengganti pakaian, membersihkan kulit, dan memisahkan penderita.
Dokumen ini membahas cacar air, termasuk gejalanya seperti demam dan kemerahan kulit, pencegahannya melalui imunisasi bagi anak dan orang dewasa, serta pengobatannya dengan obat antivirus asiklovir dalam bentuk tablet dan salep selama 7-10 hari.
This document discusses the 5-Star Linked Open Council Decisions (LBLOD) project. The project aims to map intermunicipal structures and publish local government decisions as linked open data. It involves stakeholders to develop a standardized data model and open methodology for local decision-making processes. The goals are to provide a central searchable repository of machine- and human-readable local decisions through a proof of concept application. The document outlines the project timeline and challenges participants to build applications using the published local decision data.
metadata & open source #osgeonl dag 2012 pvangenuchten
Een presentatie op osgeo nl dag Velp.
Data providers hebben afgelopen jaar (getriggered door open data, atlas vd leefomgeving en inspire) hard gewerkt om hun data en metadata online te krijgen. Nu is het de beurt aan de client-software om goed met de gebruikte jonge standaarden aan de gang te gaan om de data optimaal te ontsluiten. Enkele use-cases worden gepresenteerd hoe dit zou kunnen.
M4B presentatie op de e-commercedag georganiseerd door de VVB voor de boekhandel die graag een webshop wil opstarten. Deze presentatie gaat over de data (of metadata) die kan worden afgenomen van onze grootste boekendatabank van de Lage Landen.
TYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en FluidTYPO3 Nederland
Met de lancering van de nieuwste TYPO3 CMS versies wordt het belang van werken met Extbase en Fluid steeds groter. Wanneer je nog extensies ontwikkelt op basis van pi_base of je wilt graag extensies gaan ontwikkelen maar je vindt de drempel is hoog, in deze sessie leer je de basics van een extase / fluid extensie en zie je hoe eenvoudig het is om extensies te ontwikkelen.
Henjo Hoeksma
Na enkele jaren als hobby met TYPO3 gewerkt te hebben, heeft Henjo zijn beroep gemaakt van het ontwikkelen van websites en webapplicaties op basis van het TYPO3 framework. Na een korte periode gewerkt te hebben als ontwikkelaar bij alterNET en een internationaal bedrijf in LED verlichting is hij als freelancer aan de slag gegaan.
Met een passie voor nieuwe technieken, kwalitatieve oplossingen & code en de TYPO3 producten & community ondersteunt hij vanuit zijn bedrijf Stylence zowel grote als minder grote organisaties in de ontwikkeling van websites en maatwerk oplossingen.
Recap van RoboCon 2020 Helsinki.
Hierin wordt ingegaan op de nieuwe 2.3 versie van Robot Framework. Ook wordt bekeken wat de stand van zaken is rondom de open source RPA community.
ADO ActiveX data objects 1st Edition Jason T. Roffuiddngaelae
ADO ActiveX data objects 1st Edition Jason T. Roff
ADO ActiveX data objects 1st Edition Jason T. Roff
ADO ActiveX data objects 1st Edition Jason T. Roff
La hipertermia maligna es una reacción potencialmente mortal a los anestésicos generales que causa una acumulación anormal de calcio en las células musculares. Los síntomas incluyen rigidez muscular, aumento de la temperatura corporal, taquicardia y cambios cutáneos. El tratamiento consiste en administrar dantroleno sódico para bloquear la liberación de calcio, enfriamiento y suspender el desencadenante. Puede diagnosticarse mediante pruebas de contractura muscular o biopsia, y los pacientes deben recibir
Hipotermia dan hipertermia adalah gangguan suhu badan yang berbahaya. Hipotermia terjadi apabila suhu badan jatuh di bawah 35 derajat Celsius akibat pendedahan kepada persekitaran sejuk, manakala hipertermia berlaku apabila suhu melebihi 41 derajat disebabkan pendedahan berlebihan kepada haba atau penyakit. Kedua-dua keadaan memerlukan rawatan segera untuk menstabilkan suhu badan dan menceg
Cacar air adalah penyakit kulit yang disebabkan virus varicella dan menyebabkan demam, bercak berisi cairan di kulit, dan radang tenggorokan. Penyakit ini menular melalui percikan ludah atau benda terkontaminasi. Pencegahan meliputi kebersihan, makanan bergizi, vaksinasi, dan menghindari sumber penularan, sementara perawatannya dengan mengganti pakaian, membersihkan kulit, dan memisahkan penderita.
Dokumen ini membahas cacar air, termasuk gejalanya seperti demam dan kemerahan kulit, pencegahannya melalui imunisasi bagi anak dan orang dewasa, serta pengobatannya dengan obat antivirus asiklovir dalam bentuk tablet dan salep selama 7-10 hari.
This document discusses the 5-Star Linked Open Council Decisions (LBLOD) project. The project aims to map intermunicipal structures and publish local government decisions as linked open data. It involves stakeholders to develop a standardized data model and open methodology for local decision-making processes. The goals are to provide a central searchable repository of machine- and human-readable local decisions through a proof of concept application. The document outlines the project timeline and challenges participants to build applications using the published local decision data.
metadata & open source #osgeonl dag 2012 pvangenuchten
Een presentatie op osgeo nl dag Velp.
Data providers hebben afgelopen jaar (getriggered door open data, atlas vd leefomgeving en inspire) hard gewerkt om hun data en metadata online te krijgen. Nu is het de beurt aan de client-software om goed met de gebruikte jonge standaarden aan de gang te gaan om de data optimaal te ontsluiten. Enkele use-cases worden gepresenteerd hoe dit zou kunnen.
M4B presentatie op de e-commercedag georganiseerd door de VVB voor de boekhandel die graag een webshop wil opstarten. Deze presentatie gaat over de data (of metadata) die kan worden afgenomen van onze grootste boekendatabank van de Lage Landen.
TYPO3 Congres 2012 - Aan de slag met TYPO3 Extbase en FluidTYPO3 Nederland
Met de lancering van de nieuwste TYPO3 CMS versies wordt het belang van werken met Extbase en Fluid steeds groter. Wanneer je nog extensies ontwikkelt op basis van pi_base of je wilt graag extensies gaan ontwikkelen maar je vindt de drempel is hoog, in deze sessie leer je de basics van een extase / fluid extensie en zie je hoe eenvoudig het is om extensies te ontwikkelen.
Henjo Hoeksma
Na enkele jaren als hobby met TYPO3 gewerkt te hebben, heeft Henjo zijn beroep gemaakt van het ontwikkelen van websites en webapplicaties op basis van het TYPO3 framework. Na een korte periode gewerkt te hebben als ontwikkelaar bij alterNET en een internationaal bedrijf in LED verlichting is hij als freelancer aan de slag gegaan.
Met een passie voor nieuwe technieken, kwalitatieve oplossingen & code en de TYPO3 producten & community ondersteunt hij vanuit zijn bedrijf Stylence zowel grote als minder grote organisaties in de ontwikkeling van websites en maatwerk oplossingen.
Recap van RoboCon 2020 Helsinki.
Hierin wordt ingegaan op de nieuwe 2.3 versie van Robot Framework. Ook wordt bekeken wat de stand van zaken is rondom de open source RPA community.
ADO ActiveX data objects 1st Edition Jason T. Roffuiddngaelae
ADO ActiveX data objects 1st Edition Jason T. Roff
ADO ActiveX data objects 1st Edition Jason T. Roff
ADO ActiveX data objects 1st Edition Jason T. Roff
Presented at SMWCon Fall 2017 (https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2017/Implementing_Rule-based_Systems_with_Semantic_MediaWiki)
This talk presents an application Semantic MediaWiki to assessing student deliverables for both substantive feedback and official grades. This system offers several features that streamline different aspects of the grading process. These include determining feedback, calculating grades and performing simple learning analytics. The wiki uses Semantic Forms for efficient entry of feedback by the grader. This approach processes machine-readable student deliverables to propose some feedback to the human grader as prefilled wiki form code. The grader can then use forms to confirm or adapt the computer-generated feedback and to add more feedback. Once an assessment is complete, the wiki calculates a proposed grade for the grader to confirm or adapt. The wiki then also generates an email link with feedback for the student. Finally, the wiki provides some simple learning analytics about the feedback and grades to assist the grader in evaluating and adapting the assessment system.
Remco van Veenendaal (Nationaal Archief) = persistent identifiers. Kennismiddag Duurzame toegang met persistent identifiers en Linked Open Data. 28 juni 2018, Netwerk Digitaal Erfgoed
This document outlines a course on Knowledge Representation (KR) on the Web. The course aims to expose students to challenges of applying traditional KR techniques to the scale and heterogeneity of data on the Web. Students will learn about representing Web data through formal knowledge graphs and ontologies, integrating and reasoning over distributed datasets, and how characteristics such as volume, variety and veracity impact KR approaches. The course involves lectures, literature reviews, and milestone projects where students publish papers on building semantic systems, modeling Web data, ontology matching, and reasoning over large knowledge graphs.
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
Presentation of our paper at the WHISE workshop at ESWC 2016 on requirements for metadata over non-public datasets for the science & technology studies field.
The document describes an analysis of 177 scientific workflows from Taverna and Wings systems. The analysis identified common "motifs" in workflows, including data-oriented motifs characterizing common data activities, and workflow-oriented motifs characterizing how activities are implemented. These motifs could help inform workflow design and the creation of automated tools to generate workflow abstractions, in order to facilitate understanding and reuse of workflows.
QBer is a tool that connects individual researchers' data to a structured data hub in the cloud, allowing them to augment and link their datasets according to community best practices, share machine-readable codebooks, and publish standards-compliant reusable datasets. It envisions growing a giant graph of interconnected datasets by using web-based linked data. The structured data hub would include external linked data and standard vocabularies to map variables and identifiers across datasets for individual researchers.
The document summarizes the JURIX 2014 conference. It provides details on the location in Krakow, Poland and lists the chairs and topics to be covered on the first day of the conference. Over 50 papers were submitted and 30 were accepted for presentation. The conference included full papers, short papers, posters and demos covering theory, technology and applications of AI and law. There were over 100 registered participants for the event.
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
The document summarizes a converts' rally held at Carnegie Hall in New York City on September 14, 1908 by the Evangelistic Committee. It discusses ingredients for publishing open data, including using URIs, versioning, repeatable transformations, choosing an appropriate level of detail, combining vocabularies, contextualizing information, and provenance. Provenance, or the origin and history of data, is a key issue in publishing open government data and builds trust for application developers and the public. Standards like the W3C PROV ontology can help represent provenance.
Prov-O-Viz is a visualisation service for provenance graphs expressed using the W3C PROV vocabulary. It uses the Sankey-style visualisation from D3js.
See http://provoviz.org
Linkitup: Link Discovery for Research DataRinke Hoekstra
Linkitup is a Web-based dashboard for enrichment of research output published via industry grade data repository services. It takes metadata entered through Figshare.com and tries to find equivalent terms, categories, persons or entities on the Linked Data cloud and several Web 2.0 services. It extracts references from publications, and tries to find the corresponding Digital Object Identifier (DOI). Linkitup feeds the enriched metadata back as links to the original article in the repository, but also builds a RDF representation of the metadata that can be downloaded separately, or published as research output in its own right. In this paper, we compare Linkitup to the standard workflow of publishing linked data, and show that it significantly lowers the threshold for publishing linked research data.
Linked (Open) Data - But what does it buy me?Rinke Hoekstra
The document discusses linked open data and some of the challenges with implementing it. It notes that data needs to be converted to RDF and published on the web with an open license. It also discusses concerns about people drawing incorrect conclusions from data and potential privacy issues if data is combined. The document advocates for making the transformation of data to linked open data repeatable, choosing appropriate levels of detail, using multiple vocabularies and identifiers, adding context to information, and adding provenance details.
Linked Science - Building a Web of Research DataRinke Hoekstra
The document discusses allegations of scientific misconduct against Dutch psychologist Diederik Stapel. It describes how Stapel is accused of fabricating research results and falsifying data in multiple studies. A commission was formed to investigate the accusations, which found that Stapel had likely fabricated all of his published work. The misconduct has prompted concerns about the oversight of research in the Netherlands and calls for reform.
The document discusses the Data2Semantics project. It aims to build useful services and tools for data publishers that maintain provenance information and cater for the entire research cycle, including a feedback loop to new research. One use case presented is developing a VIVO installation to demonstrate collaboration within a research community and integrate project results with the collaboration network. Future work discussed includes improving metadata extraction, ingesting additional content, developing shared ontologies between installations, and implementing reward mechanisms for individual authors.
This document discusses using semantic web technologies like linked data to improve the sharing, analysis, and reuse of research data. It describes two projects aimed at applying these techniques: the CEDAR project, which semantically links historical census data from the Netherlands, and the LarKC project, which develops a scalable linked data analysis pipeline. Key challenges discussed include building tools to help data publishers while maintaining provenance, accommodating the full research lifecycle, and overcoming heterogeneity to enable querying across datasets. TabLinker is presented as a tool to help with the (semi-)automatic conversion and harmonization of heterogeneous tabular data into linked open data.
The Data2Semantics project (COMMIT P23) is all about enriching research data, and making it more reusable for future research. Using Linked Data for this task is a fairly obvious step to make (surprise!). However, there are several shortcomings the current practices in publishing Linked Data, that calls for a slightly
different approach which (hopefully) bridges a gap between Web 2.0 and Web 3.0. I will present a proof-of-concept service (Linkitup) that works on top of existing scientific data repositories, and allows individual researchers to enrich their data with additional (linked) metadata.
The document discusses the problem of knowledge acquisition in artificial intelligence. It describes knowledge acquisition as the critical bottleneck problem that has hindered the development of successful applied AI. The document outlines several historical problems with knowledge engineering, including a lack of hardware and trained knowledge engineers. It also discusses various methods that were developed to help with the process of eliciting and acquiring knowledge from experts, such as repertory grids, think aloud methods, and card sorting. Modern approaches and methodologies for building ontologies are also covered, such as CommonKADS and METHONTOLOGY.
Talk about the use of Linked Data in historical research on census data. Has some slides about TabLInker as well (http://github.com/Data2Semantics/TabLinker). Part of the data2semantics project (http://data2semantics.org)
Presentatie voor de Belastingdienst in het kader van een onderzoek naar de (on)mogelijkheden rond het herkennen en extraheren van concepten en hun definities, en het representeren daarvan met Semantic Web standaarden.
The document describes the Semantic Web languages including the Resource Description Framework (RDF), RDF Schema, and SPARQL query language. It provides an overview of these key languages for representing and querying linked data on the web according to common standards. The diagram shows the "Linking Open Data cloud" which depicts the growth in structured datasets and links between datasets published in RDF on the public web.
The document discusses the Semantic Web and Linked Data. It provides an overview of RDF syntaxes, storage and querying technologies for the Semantic Web. It also discusses issues around scalability and reasoning over large amounts of semantic data. Examples are provided to illustrate SPARQL querying of RDF data, including graph patterns, conjunctions, optional patterns and value testing.
History of Knowledge Representation (SIKS Course 2010)Rinke Hoekstra
The goal of AI research is the simulation and approximation of human intelligence by computers. To a large extent this comes down to the development of computational reasoning services that allow machines to solve problems. Robots are the stereotypical example: imagine what a robot needs to know before it is able to interact with the world the way we do? It needs to have a highly accurate internal representation of reality. It needs to turn perception into action, know how to reach its goals, what objects it can use to its advantage, what kinds of objects exist, etc.
The field of knowledge representation (KR) tries to deal with the problems surrounding the incorporation of some body of knowledge (in whatever form) in a computer system, for the purpose of automated, intelligent reasoning. In this sense, knowledge representation is the basic research topic in AI. Any artificial intelligence is dependent on knowledge, and thus on a representation of that knowledge. The history of knowledge representation has been nothing less than turbulent. The roller coaster of promise of the 50's and 60's, the heated debates of the 70's, the decline and realism of the 80's and the ontology and knowledge management hype of the 90's each left a clear mark on contemporary knowledge representation technology and its application.
The document discusses design patterns for ontologies. It notes that certain patterns are more useful than others and that design patterns can capture fundamental design decisions and recurrent structures that reflect cognitive notions. Design patterns can bridge the gap between conceptual models and implementation and provide insights into expert knowledge. The document advocates moving beyond best practices to analyze design decisions in existing ontologies and assessing tradeoffs to discover more patterns.
2. The Problem
• Knowledge
• Provenance
Regulation A Art 12 Art 14, lid 3, 2e volzin Art 14, lid 3, 2e volzin
(01-01-2011) (04-02-2011) (11-06-2008) (01-07-2011)
• Open Data: public service falls short
• Large scale validation of CEN MetaLex
• “Linked Open Government Data”
3. Current
Situation
Public content services hosted at wetten.nl
4. Wetten.nl XML Service
http://wetten.overheid.nl/xml.php?regelingID=...
• Only available format is BWB XML
• Only current version
• Content at document level
• Identification at document level
• Identifiers are not dereferencable
• Hardly any metadata (e.g. version date)
• Only available context is position in text
6. Identifiers &
Juriconnect
1.0:c:BWBR0005416&artikel=6
vs
http://wetten.overheid.nl/cgi-bin/deeplink/law1/bwbid=BWBR0005416/article=6/date=2005-01-14
vs
http://wetten.overheid.nl/BWBR0005416/TitelII698946/HoofdstukII/Artikel16/
geldigheidsdatum_14-01-2005
• Juriconnect?
• URN-based... but no naming server
• (cf. Document Object Identifiers)
• Named elements do not carry identifier
• No explicit version information, only contextual
7. Sources used...
• List of all regulations in “XML”
• Wetten.nl XML Service
• Metadata in HTML table on wetten.nl
(the “info page”)
• ... so let’s get started already
9. Our Goals
• “Deserialize” regulation content
(e.g. topic-based browsing)
• Extract and reconstruct implicit information
(identifiers, metadata)
• Annotate regulations
(reconstructed metadata, third-party metadata)
• Annotate using regulations
(knowledge based systems, services, business processes ...)
• Accessible and reusable for any other party
(shared vocabularies, standard access)
10. Requirements
• Unique, persistent identification
• Generic XML structure of documents
• Extensible metadata framework
• Flexible web services
11. Technology Choices
• URL-like URIs
• CEN MetaLex XML documents
• Linked Data / RDF metadata
(extensibility to OWL, RIF)
• Transparent REST-services
12. Step 2
Come up with persistent identifiers at
element level and a solid versioning scheme
13. Identification
• Web-enabled “URL-like” URIs
• e.g. http://doc.metalex.eu/....
• “Cool” URIs (http://www.w3.org/TR/cooluris/)
• “Accept”-header based dereferencing
• Different types of content at same URI
14. Levels of Identification
Bibliographic
Work
Entity
realizes
• IFLA FRBR levels Expression
embodies
• Work Manifestation
exemplifies
• Expression Item
• Manifestation XML version of
regulation on
XML version of Version of
Regulation
regulation regulation
my harddisk
15. Transparent Identifiers
• Hierarchical information (work)
http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1
http://doc.metalex.eu/id/BWBR0011823/artikel/1
• Version and language (expression)
http://doc.metalex.eu/id/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01
• Format information (manifestation)
http://doc.metalex.eu/doc/BWBR0011823/hoofdstuk/1/artikel/1/nl/2010-09-01/data.xml
16. Problem
• URIs don’t carry semantics...
• Detect changes:
• which element versions are the same
• ... and which versions are different?
Art. 44, lid 4
(2011-03-26)
Art. 44, lid 4
(2011-04-05)
from: Besluit prudentiële regels Wft, BWBR0020420
19. Procedure
For each BWB XML file listed,
if update has occurred since latest run,
download latest version,
scrape metadata, and
produce:
Persistent URIs
CEN MetaLex + Citations
Inline RDFa (optional) or RDF graph (optional),
Pajek “.net” files (optional)
20. CEN MetaLex
• Straightforward 1:1 mapping
• ... some minor fixes
• Mint URI’s on the fly
• Convert citations on the fly
• Generate metadata on the fly
• “inline” inside mcontainer elements
21. Results
14
Table 1. Conversion performance for 300 randomly selected regulations.
Number % Number %
42
Substitutions Corrections
container 22312 29 % artikel 2525 72 %
hcontainer 3730 5% divisie 519 15 %
htitle 3730 5% colspec 289 8%
block 34325 44 % illustratie 54 2%
inline 13527 17 % others 99 3%
Total 77624 Total 3486
Total no. of regulations 300
Revoked regulations 109 30 %
Correction % 4%
Lastly, the MDS offers a simple search interface for finding regulations based on
the title and version date.
6 Conclusion(full description in draft ISWC 2011 paper)
and Results
We ran the MetaLex conversion script on all regulations available through the
wetten.nl portal, resulting in a total of 27.687 versions of regulations being con-
40
23. Metadata Vocabularies
• “RDFized” BWB elements
• MetaLex ontology
• FRBR type, modification events, structure
• Dublin Core
• title, alternativeTitle, version
• FOAF
• page, homepage
• Simple Event Model (SEM)
• Open Provenance Model vocabulary (OPMV)
• W3C Time Ontology
25. Events & Provenance
The date at which the expression was created
"2009-10-23"^^xsd:date time:Instant ml:Date sem:Time
rdf:value
sem:hasTimeStamp rdf:type
rdf:type sem:timeType
time:inXSDDateTime rdf:type
opmv:Process http://doc.metalex.eu/id/date/2009-10-23 sem:Event ml:LegislativeModification
sem:hasTime rdf:type
rdf:type time:hasEnd rdf:type
ml:date sem:eventType The creation event of the regulation
http://doc.metalex.eu/id/process/BWBR0017869/2009-10-23 http://doc.metalex.eu/id/event/BWBR0017869/2009-10-23 opmv:Artifact
opmv:wasGeneratedAt
The process that generated the expression ml:resultOf
rdf:type ml:BibliographicExpression
opmv:wasGeneratedBy
rdf:type
http://doc.metalex.eu/id/BWBR0017869/2009-10-23
The expression (version) URI of a regulation
27. Document Serving
• RESTful API
• Implement Cool URIs
(Dereference to XML, RDF, .net)
• Shorthands (‘/latest’)
• SPARQL endpoint
• Citation graphs
• Rudimentary (and unpredictable) search
• CSS Stylesheet for CEN MetaLex XML
28. Dereferencing (RDF)
File containing Turtle serialisation of SCBD http://doc.metalex.eu/id/BWBR0011823/nl/2010-09-01
Accept: application/x-turtle
1 Client requests URI
MDS returns Turtle 5
http://doc.metalex.eu/doc/BWBR0011823/nl/2010-09-01/data.ttl
2 Server redirects to manifestation URI (HTTP 303)
JSON serialisation SPARQL
Triplestore returns SCBD 4 of SCBD Query 3 Server queries triplestore for Symmetric Concise Bounded Description (SCBD)
http://www.w3.org/Submission/CBD
29. Dereferencing (XML)
Location of Manifestation http://doc.metalex.eu/id/BWBR0011823/nl/2010-09-01
Accept: text/xml
http://doc.metalex.eu/files/BWBR0011823_2010-03-01_mls.xml 1 Client requests URI
MDS redirects to Manifestation URI (HTTP 302) 6
http://doc.metalex.eu/doc/BWBR0011823/nl/2010-09-01/data.xml
2 Server redirects to manifestation URI (HTTP 303)
Triplestore returns URI of Manifestation 5 Manifestation Glob 3 Server queries file store for XML manifestation
4 If no manifestation exist, extract from parent
(extract)
(Clients may render XML using CSS stylesheet)
30. Dereferencing (...)
• Other RDF syntaxes
application/rdf+xml, text/rdf+n3
• HTML clients
application/xml, application/xhtml+xml, text/html
• Redirect (303) to Marbles browser
• Pajek clients
text/plain
• Download .net file
• View using Gephi Toolkit
http://gephi.org
31. Technical Details
• Current situation
• +/- 27 thousand regulations
• 87.9 million triples (legislation.gov.uk: 1.9 billion)
• Updated daily
• Technical details
• Dell PowerEdge II T110, 32GB RAM
• Garlik 4Store triplestore (http://4store.org)
• Python Django web applications
• Tomcat servlet + Gephi Toolkit API
• See http://doc.metalex.eu
32. Step 5
Use: social network analysis and concept
extraction (ongoing work)
33. Network Analysis
• Impact of regulation on other
regulations
(combine with work on court rulings)
• Connectedness
• “Importance” of articles
• Analysis tools
• Pajek, Gephi
36. Concepts & Definitions
• Find explicit definitions (Emile de Maat)
For the purposes of this law,
a foster child is not considered to be a child.
• Find implicit mentions
At any one time, a person can only have one partner.
• Create RDF SKOS vocabulary
(http://www.w3.org/TR/skos)
• Connect to Cornetto Wordnet thesaurus
• Connect to MetaLex identifiers
39. Future Work
• Improve search capabilities
(Apache Lucene)
• Extend types of documents
• National (official publications)
• International (NIR, Akomo Ntoso, CHLexML)
• Empirical study of network analysis
• More concept extraction