LinkingDataInEurope
back to LD group main page, meetings summaries
Contents
- 1 Agenda, Meetings, events
- 1.1 Meeting in physical venue in 2024
- 1.2 Webinar : November 16th, 10:00 CEST
- 1.3 Webinar November 6th, 14h30 CEST
- 1.4 October 2nd, 14:30 CEST, zoom
- 1.5 September 21st
- 1.6 September 6th 2023 : Mid term dissemination event
- 1.7 monday july 3rd
- 1.8 thursday june 15th
- 1.9 monday june 5th
- 1.10 thursday may 18th
- 1.11 thursday april 20th
- 1.12 monday april 3rd
- 2 D1 : Use case definition
- 3 Internship
- 4 D1-D4 : Gathering knowledge for the KG : specific assets, their (rdf) metadata
- 5 D3 : existing ontologies
- 6 D4 : Core KG model, list of digital assets and their metadata to integrate in the KG =
- 7 Project factsheet
Agenda, Meetings, events
Webinars hold generally every first monday of the month from 14:30 to 15:30 CEST, and every third thursday from 10:00 to 11:00 CEST.
Meeting in physical venue in 2024
Please share your ideas ...
When : if standalone, June sounds nice Standalone possible venue : Paris, Madrid, Attach to event (fully open or combined) : FAIR related event? OGC related event? Web related event? UN GGIM WG on data integration (https://un-ggim-europe.org/working-groups/working-group-data-integration/) ?
Webinar : November 16th, 10:00 CEST
zoom : contact B Bucher to get permanent link
Webinar November 6th, 14h30 CEST
Work on the project mid term report
October 2nd, 14:30 CEST, zoom
- Participants : PekkaL, PeterP, DavidP, LuisML, DimitrisK, BenedicteB
- Planning the publication of our mid-term report as an official EuroSDR publication : a report is being written collaboratively to become an official EuroSDR publication. This kind of publication is adapted to our specific context: we need to share technical messages, deliverables also, with relevant people within organisations that read EuroSDR publications. It will also be a milestone to frame the experience in such a way that we all understand how to contribute. In particular for these who can recruit master students how to do so (a fuzzy job is not very attractive for master students). Scientific publications in journals can be targeted later.
The intro will set the general context, references. A section after will report on Metadata matters event which was an opportunity to get some update on the adoption of metadata and in particular rdf format and linked data technologies for metadata. In that section participants can detail some aspect they especially appreciated. (Erwin, Lexi, Pekka, Bénédicte, Audrey, Marjan, … ?) See https://www.pldn.nl/wiki/Metadata_Matters,_What_Role_Does_Linked_(Meta)Data_Play_in_the_Geospatial_World%3F_%E2%80%93_6_September_2023,_Sound_and_Vision,_Hilversum,_the_Netherlands for the agenda and slides. Another section will frame the experience. Firstly argue the GUF use case : why is sharing geospatial user feedback relevant and in what sense can it benefit from a EuroSDR KG about digital assets. Secondly, write down some more specific scenarios we wish to experience, user1 is assisted in writing his GUF and then User2 queries the KG to retrieve relevant GUF and adapt it to its context. And thirdly listing relevant sources (of metadata, of vocabularies) to be considered. We wish to achieve a first version in November 15th to be sent to reviewers. A final structure is proposed (summarized hereabove) and then people can suggest to contribute to one section or another. Contribution can also be proof-reading the report to be more legible, as the topic is quite abstract and complex. Action : everyone who wishes to contribute contact BB to be invited.
- Planning a physical event in 2024 : when where .... Action : BB launches a pool within the group to decide between different options
- option 1 : a stand alone meeting, could be for exemple in Paris, in Madrid, ..others ? When : June would be nice so that the internee has started to work.
- option 2 : attach to another event and possibly open the venue. Possible attaches : AGILE conference, OGC metadata meeting, …
- Other stuff : Switch from the wiki to a git platfom where we can share codes and vocabularie while keeping a wiki also if needed. Action : find a shorter name for the git platform than geometadatalabs. Idea ?
Meeting : two calls per month is a bit too many, get back to monthly calls
- Review the slides for next delegates meeting
September 21st
- Zoom
- Discussion on the need for another platform to work together and share schemas, git?
- Use EuroSDR publication of scientific publication to publish a mid term report with the description of the experiment
- User feedback could also relate to access to a specific product, e.g. download services, documentation, ...
September 6th 2023 : Mid term dissemination event
Location : Hilversum (the Netherlands) Joint event https://www.pldn.nl/wiki/Metadata_Matters,_What_Role_Does_Linked_%28Meta%29Data_Play_in_the_Geospatial_World%3F_%E2%80%93_6_September_2023,_Sound_and_Vision,_Hilversum,_the_Netherlands
monday july 3rd
cancelled
thursday june 15th
participants : Pekka, Bénédicte
- The timing of the webinar is complex for many participants to join, need to decide for new slots. Next webinar will be July 3rd, 14:30 CEST, we will decide then.
- An important event is the mid-term workshop which takes the form of a joint symposium in Hilversum : https://www.pldn.nl/wiki/Metadata_Matters,_What_Role_Does_Linked_%28Meta%29Data_Play_in_the_Geospatial_World%3F_%E2%80%93_6_September_2023,_Sound_and_Vision,_Hilversum,_the_Netherlands
- To register to sept 6th : https://docs.google.com/forms/d/e/1FAIpQLSceWufDZKlo5J-DVcivK8EfnafO-bybGTsCLMU3ZQhD1WYcTw/viewform
- Need to update the project website (this wiki) to update the user stories.
- Discussion on the user stories :
- assisting in the publication/enriching of metadata into the KG through a html form. BB check if the form can be hosted on pldn and then it could be embedded in different website like EuroSDR website directly. Kind of metadata to enrich : the API to access data.
- assisting in sharing user feedback : focus on the integration, how dataset can be integrated, with which tools, libraries and possibly ad hoc software
monday june 5th
participants : Pekka, Hara, Bénédicte, Marjan Action by mid june : describe more in details the use cases and then work on the required vocabularies and services to implement them. Discussion on use cases. The use case are there to engage a real dialogue between people who are KG specialists and people who hold the knowledge related to the different geodata products in Europe. These people are represented in the LD group and we want to design specific use cases where we can perform evaluation (so identify the specific products, links etc that must be integrated in the KG and then produce the corresponding item of the KG, test the SparQL queries ...). BB needs to update the GMLabs page related to use cases (action BB). Discussion more specific on two kind of user feedbacks. Feedback related to integrating different data sources to build a digital twin : this experience was described by NBus and we wish to produce the relevant items in the KG in order to support the sharing of this feedback and for Hara and Pekka the evaluation of how to transpose this experience in their specific countries/cities. Feedback related to computing sustainability indicators based on data from different sources : here Pekka could describe how to integrate topo and statistics, which is often used for indicators; and Bénédicte could engage some master students, during a course on SDI, to describe how they prototype indicators based on integrating data together
thursday may 18th
Participants : Pekka, David, Bénédicte
More specific discussion on the use cases, to foster the writing of the paper, first draft examined. Use cases : sharing user feedbacks about using geodata, use real applications we are working on, the geodata do not need to be linked data
thursday april 20th
Participants : Pekka, Nicolas, Bénédicte
Discussion around use cases : sharing user feedbacks should be identified as a use case by itself, especially these users who need to develop advanced knowledge related to so-to-say links between data products (the fact that they integrated a dataset from product A with a dataset from product B in their application).
Discussion on relevant ontologies, models, vocabularies to consider in the KG
monday april 3rd
Participants : Lexi, Bénédicte, Marjan, Lars, David
Summary:
- recruiting is ongoing with a good candidate for the position 2 to be discussed between Lexi and Dimitris. Could start May. Do not use the project budget.
- Action BB: get back to participants to see if they have some students or possibility to get someone to work on the topic of feeding the KG with existing metadata, possibly with a focus different from the wizzard development.
- Discussion on Use cases : they should describe the production of the KG and its exploitation, they are intertwinned as the production must address the specific exploitation.
- Exploitation : distinguish end users who benefit from the KG (UC2) and KG experts who interface them with the KG technology (UC2bis).
- Use case Discovery end users : A user wishes to discover existing assets. we need to be more detailed in the writing of this use case.
- Use case Discovery tool designer : specialists working on discovery technologies and catalogues make use of eurosdr KG to improve their system.
- Production : while previous work within EuroSDR LD group studied how to get metadata providers directly generate the KG based on existing metadata (transformation and loading on sandbox), we propose to produce core statements for the KG, which will be enriched by such metadata files, and we also propose to distinguish the identification of knowledge and metadata from data expert and the production of the KG triples by Linked data specialists.
- Use Case MetadataData experts : metadata experts who know the data products, who know the main models from their domain but who cannot cope with a KG directly contribute through a wiki interface to listing items to be integrated in the KG
- Use case KG builder : specialists who don't know the metadadata specific to the domain, and data products but who can transform structured information into RDF statements and identify the correct vocabularies from the Web when there is no such vocabulary yet from the domain. KG specialists produces triples based on items provided by metadata experts.
D1 : Use case definition
Pending : identify countries and contact, as well as representative users, specifies possible architectures for each use case, not to decide yet but to ensure the internee positions have no ambiguity left
Principal UseCase : Publication and sharing of Geospatial User Feedback
A user writes down some feedback on using some data, possibly some data together, and check if other users had similar feedback. Another user is interested in browsing user feedbacks that can be relevant to his contexte, either because the feedback is related to a similar asset as the one he is downloading, or because the feedback is related to an application (user objective) similar as his own objective.
UseCase to keep in mind for future work : Publication of providers metadata, Using the KG to publish more tractable metadata ...
- Targeted categories of users :
- Providers of data related to Buildings, to Hydrography, to LULC, in different countries but also in the same country
- Targeted KG items :
- URIs for these data products
- Explanation where to find detailed documentation
- Properties imported from existing metadata into RDF properties
- additional properties : schema, lineage
- Details of the use case :
- Check if a product is already referenced
- If not, create a URI and a node using a method adapted to the user capacities and working environment
- search for already documented data products to reuse description patterns
- Reference existing metadata
- Create additional properties
- Edit existing metadata to correct or enrich them
UseCase to keep in mind for future work : Assets discovery
A user is looking for data related to Buildings in different cities/areas in Europe, and the data may come from different sources and the user does know in advance the sources. These sources are not necessarily linked data. He selects a concept of interest (ex Building) and gets a description of available assets. He can get information about each asset (how to access the asset, is it available as linked data, can he be interconnected with other, ...
Internship
(archived description, to be refreshed) Interne 1: Development of the Metadata Wizard
To support users with the transformation of metadata to linked data as well as the editing and enrichment of metadata for publication in the metadata knowledge graph, it is necessary to provide users with the correct tooling. Requirements for this tool have been drafted by members of the EuroSDR Linked Data Group. This tool should be based on the LDWizard (https://github.com/pldn/LDWizard), an open-source tool that supports the transformation of data to linked data in a user-friendly manner. This code base should be used in the development of the Metadata Wizard. The selected intern will work towards building a toolset that will import in an intelligent way metadata from diverse sources related to spatiotemporal data and will add this metadata to a Knowledge Graph (KG) where they can be linked with previously existing metadata. The developed toolset would need to process, understand, link and semantically enrich the provided metadata. Main contact : Lexi Rowlands
(archived description, to be refreshed) Intern 2: Graph Alignment, Semantic Mapping, and Ontologies
Metadata for digital assets in Europe exists in a range of formats and in compliance with a range of standards and vocabularies. Usually though, metadata follows its own semantic descriptions using one or more of the metadata vocabularies, that are/will be eventually part of the Knowledge Graph (KG). This introduces the problem of identifying metadata descriptions about similar objects (entities) by performing some form of alignment over the fragmented semantic graphs. Before the transformation of metadata to linked data can be carried out, the range of metadata vocabularies should be identified and should be aligned to support the semantic and structural homogeneity of metadata. Additionally, and since one of the intended features of the metadata Knowledge Graph is the richness of the metadata itself, this would allow us to enrich existing metadata with more information to support improved findability and searchability of digital assets in Europe. Modern solutions in the area include the use of Machine Learning (ML) and Artificial Intelligence (AI) techniques to allow for a more intelligence and accurate aligning and linking of the spatiotemporal metadata. Main contact : Dimitris Kotzinos
D1-D4 : Gathering knowledge for the KG : specific assets, their (rdf) metadata
- a URI : ex BDTopo.FR
- keywords : ex : topographic
- Schemas :
- possible integration with other products
D3 : existing ontologies
This analysis is now pursued through a working paper, with a focus through use cases to avoid engaging in a lengthy report
existing models outside the GI community
- DDI-CDI is a metadata model dedicated to cross domain integration https://ddialliance.org/Specification/ddi-cdi , to be considered here?
- CKAN is a cataloguing tool, compliant with DCAT, which implements relationships between datasets at metadata level : https://github.com/ckan/ckan/wiki/Dataset-relationships
series : linking sibling GI assets in time
If the assets are sufficiently similar (homogenous), they can be treated as a time series. ISO 19115 (& the INSPIRE guidelines on using it) allow for metadata at the “series” level and then at the individual dataset level. In 19115:2003, as used by INSPIRE, the dataset metadata links back to the series using an element called “series”; this is a metadata to metadata link. The same thing exists in the current ISO 19115-1:2014, but remodelled a bit. dct:isPartOf looks similar. dct:replaces & dct:isReplacedBy might be relevant.
Linking Geodatasets and statistical surveys coverage
cf example https://ec.europa.eu/eurostat/cros/system/files/NTTS2013fullPaper_46.pdf The linkage is done using the metadata "spatial extent" somehow of the survey, called coverage, http://purl.org/dc/terms/coverage & more specifically http://purl.org/dc/terms/spatial to provide a link from the metadata about the statistical survey to the “extent or scope of the resource”. The web-friendly version of that is DCAT’s dct:spatial which takes a dct:Location. DCMI allows ‘coverage’ & ‘location’ to be specified as a named “jurisdiction” preferably from a controlled list rather than using a geometry. The ISO 19115 equivalent is “EX_GeographicExtent” which allows for a polygon, bounding box, or ‘geographic identifier’ (name from a controlled vocabulary). Possible identifiers for countries and administrative units are the standard ISO 3166-3, these are visible on wikipedia (in the description of countries). The “problem” with these approaches is that the link is straight to the geometry (or named place) – the “geographical asset” itself, not the metadata of it. I (PeterParslow) can see some benefit in a statistical survey’s metadata allowing access to some metadata about the coverage (e.g. if the survey says it covers an administrative area e.g. city, which version of the city limit was used?). If the target of the link is an INSPIRE NamedPlace that of course can contain it’s validFrom/validTo temporal “metadata”. I guess the key question is ‘what aspect of the geographical asset’s metadata is valuable to the user of a statistical survey? ISO 19115 metadata can of course be attached to the geographical asset – but not much usually is, at the feature instance level.
dct:references and dct:isReferencedBy might be of interest.
Linking Geodatasets and Statistical surveys through integration process
- create a link between the datasets which holds information about how to integrate both datasets. May use free text but may also integrate dct:references and dct:isReferencedBy. It may also reference an external asset that support interconnection : in the case of France BDAdresse Premium stores relationships between adresses, buildings (from IGN BDTopo) and IRIS (from statistical survey)
Vocabularies to describe topicality
- GEMET is the ontology used in INSPIRE metadata to describe topicality. It is available as rdf : https://www.eionet.europa.eu/gemet/en/webservices/
Need for vocabularies from application domaines
D4 : Core KG model, list of digital assets and their metadata to integrate in the KG =
Core KG model
- asset URI : productname.country
- keywords : topographic, statistic, ...
- ISO19115 metadata : link to the url or file
- Conceptual schema ?
- Links : how it compares to different complementary products
Project factsheet
Funded by EuroSDR, start date : 28/09/2022, end date : 31/12/2024
Contacts : Bénédicte Bucher (IGN-F), Erwin Folmer (Kadaster), Lexi Rowlands (Kadaster)
Participating EuroSDR members : France, Netherlands, Finland, UK, Switzerland, Slovenia, Norway, Spain, Sweden, Danemark
Milestones : Kick off (Autumn 2022), Mid-term report and dissemination Workshop (September 6th 2023), Final report and dissemination Workshop (Fall 2024)
Budget : 10 000 € internee, 2 000€ venue of students to workshop, 1 000€ to fund open access data paper
Context : Combining data across different domains still requires too heavy engineering workload (coping with heterogeneities) which leaves less money and time for creating the real value out of data. Besides, it is important Europe to avoid designing 27 times the same solution (with public money) and rather to transpose and scale up solutions even if the appropriate data can differ from one country to another.
Objectives : In this context, the 140th EuroSDR Board of delegates adopted a project to engage EuroSDR members and their network in the linkage of datasets across Europe. The project studies the publication of metadata as linked data and the creation of links between metadata records. The main motivation is to set up a Metadata Knowledge Graph (MKG) ecosystem to support the joint usability of different digital assets that exist in Europe. Targeted objectives are more precisely : the design of vocabularies and of an infrastructure for such a MKG ecosystem, the implementation of use cases, the continuous engagement of more organisations in the project during these 2 years.
List of Deliverables :
- D1 : Use case definition, identification of user groups and metadata records
- D2 : Metadata and links editing, transforming and publication tool
- D3 : Ontologies and vocabularies
- D4 : Metadata Knowledge Graph and associated stories
- D5 : Use case validation
- D6 : Documentation set : Academic papers, Cookbook, INSPIRE Good Practices, UN GGIM presentation