Warsztaty 2018

Przygotowaliśmy dla Państwa do wyboru dwa równoległe warsztaty – oba następnego dnia po konferencji CDO Forum 2018 (7 czerwca). Wyboru warsztatów należy dokonać na etapie rejestracji udziału w konferencji (warsztaty są dodatkowo płatne).

UWAGA! Warsztat „Tworzenie, wykorzystanie i zarządzania Data Lake w przedsiębiorstwie” prowadzony będzie w języku angielskim! Organizator nie przewiduje tłumaczenia.

Warsztaty: (Prawie) wszystko o jakości danych
Warsztaty: Tworzenie, wykorzystanie i zarządzanie Data Lake w przedsiębiorstwie

Warsztaty: (Prawie) wszystko o jakości danych

Czas trwania

Warsztaty odbędą się 7 czerwca w godz. 9:00 – 17:10

Opis

W odpowiedzi na zapotrzebowanie uczestników CDO Forum przygotowaliśmy dedykowane warsztaty poświęcone w całości zarządzaniu i dbałości o jakość danych. Wraz z Radą Programową wybraliśmy zagadnienia, które najczęściej pojawiały się w ankietach po CDO Forum i podczas badania przeprowadzonego wśród uczestników ubiegłorocznej edycji Forum. Zaś do prezentacji poszczególnych tematów zaprosiliśmy doświadczonych ekspertów i praktyków, którzy na co dzień zarządzają i pracują z danymi w swoich organizacjach. Mają oni szeroką wiedzę na temat nieustannego poprawiania jakości danych – i chcieliby się z Państwem tą wiedzą podzielić.

Korzyści dla uczestnika

Dzięki udziałowi w warsztatach:

• Usystematyzujesz wiedzę w zakresie zarządzania jakością danych
• Na podstawie case studies dowiesz się, w jaki sposób rozwiązać konkretne problemy i wyzwania związane z dbałością o najwyższą jakość posiadanych danych
• Poznasz najlepsze praktyki w zakresie przepływu danych wewnątrz organizacji i pomiędzy systemami
• Dowiesz się, jakich narzędzi warto użyć do zarządzania danymi i ich jakością
• Uzyskasz wiedzę na temat monitorowania i podejmowania działań naprawczych
• Poznasz praktyczne aspekty jakości danych w modelach analitycznych i Data Science

Agenda warsztatów

8.30 - 9.00

Rejestracja uczestników

9.00 - 9.10

Rozpoczęcie warsztatów

Łukasz Suchenek

Content Manager, Evention

9.10 - 10.30

Czy leci z nami pilot? – Jak jakość danych wpływa na organizację?

W dobie systemów zintegrowanych, sztucznej inteligencji i zaawansowanej analityki opartej o nauczanie maszynowe, jakość danych istniejących w naszych systemach informatycznych odgrywa fundamentalną rolę w efektywności działania tychże. Jak sprawić, by inwestycje w technologie przynosily pożądane efekty biznesowe, a jednocześnie nie wymagały armii ludzi do przetwarzania i analizowania wprowadzanych danych?

Jarosław Chrupek

Global Head of Data Management, British American Tobacco, Prezes Zarządu DAMA Poland Chapter

10.30 - 10.50

Przerwa na kawę

10.50 - 12.10

Dane, jak je klasyfikować i gdzie szukać właścicieli danych?

Zdolność do zarządzania danymi w sposób niezależny od zarządzania „nośnikiem danych”, jakim są aplikacje i infrastruktura, staje się koniecznością. Zarządzanie bytem tak bardzo wirtualnym jak dane jest sporym wyzwaniem. Dlatego kluczowym zagadnieniem staje się identyfikacja grup danych przetwarzanych w organizacji i ich kategoryzacja pod względem istotności. Pojawia się tu jednak problem wyboru klucza podziału i poziomów granularności. Nie mniej istotną kwestią w zarządzaniu danymi są uczestnicy procesów, a w szczególności właściciel danych, na którym spoczywa odpowiedzialność za grupę danych, a de facto decyduje on o politykach i standardach względem tych danych.

Dorota Paszek

Menedżer Zespołu Certyfikacji i Przepływów Danych w Biurze CDO , Bank Zachodni WBK

12.10 - 13.30

Wykrywanie anomalii – klucz do zapobiegania spadku efektywności biznesu

Anomalie w danych są czymś, co zawsze należy zanalizować. Często anomalie są konsekwencją spadku jakości danych, ale mogą też być spowodowane przez zaistnienie jakiegoś zjawiska, które nie zostało przewidziane. Jak odróżnić te sytuacje, oraz jakie działania podjąć w każdym z tym przypadków? Podczas wystąpienia zostaną zaprezentowane studia autentycznych przypadków, w których analiza anomalii pozwoliła na znaczną poprawę efektywności funkcjonowania firmy.

Tomasz Brzeziński

Chief Data Scientist, iTaxi

13.30 - 14.20

Lunch

14.20 - 15.40

Data lineage - w jaki sposób zapewnić odpowiedni przepływ danych, najczęstsze problemy i sposoby ich rozwiązania; narzędzia wspierające*

Metadata is a much over-looked requirement of data warehousing, primarily since it’s not straightforward to bring together all the metadata information from the different systems. Ab Initio can help you not just to process all your Big Data but also to collect and organize the metadata which is produced by all those processes.

1. General Introduction to Ab Initio
2. What is Metadata and Why It is Important?
3. What is the Metadata Hub and how it can help You in Your everyday work tasks?
4. Looking under the Hood – Demo

*Prezentacja prowadzona będzie w języku angielskim. Organizator nie przewiduje tłumaczenia.

Balazs Petenyi

Senior Architect , Ab Initio

15.40 - 16.00

Przerwa na kawę

16.00 - 17.20

Dlaczego mój model nie jest tak dobry jak bym tego oczekiwał? Czyli jak na modele predykcyjne wpływa jakość danych.

Szacuje się, że w pracy Data Scientista około 80% czasu pracy zajmują zagadnienia związane z przetwarzanie danych – zebranie danych źródłowych, feature engineering oraz czyszczenie danych. Taka ilość czasu poświęcona na analizę oraz przetwarzanie danych źródłowych pozwala zmaksymalizować jakość danych, co bezpośrednio przekłada się na skuteczność tworzonych modeli analitycznych. Na warsztatach zostaną zaprezentowane potencjalne problemy z jakimi możemy mieć do czynienia w procesie zbierania danych wejściowych oraz jak jakość zmiennych wpływa na działanie oraz stabilność modeli predykcyjnych.

Sylwia Wyłupek

Business Data Scientist, Citi Handlowy

Warsztaty: Tworzenie, wykorzystanie i zarządzanie Data Lake w przedsiębiorstwie

Czas trwania

Warsztaty odbędą się w godzinach 9.00 – 17.00

Udział w warsztatach jest możliwy dla ograniczonej liczebnie grupy (do 25 osób maks). Obowiązuje kolejność zgłoszeń!

Opis

Most organisations today are dealing with multiple silos of information. These include cloud and on-premises based transaction processing systems, multiple data warehouses, data marts, reference data management (RDM) systems, master data management (MDM) systems, content management (ECM) systems and, more recently, Big Data NoSQL platforms such as Hadoop and other NoSQL databases. In addition the number of data sources is increasing dramatically, especially from outside the enterprise. Given this situation it is not surprising that many companies have ended up managing information in silos with different tools being used to prepare and manage data across these systems with varying degrees of governance. In addition, it is not only IT that is now integrating data. Business users are also getting involved with new self-service data preparation tools. The question is, is this the only way to manage data? Is there another level that we can reach to allow us to more easily manage and govern data across an increasingly complex data landscape consisting of multiple data stores?

This 1-day seminar looks at the challenges faced by companies trying to deal with an exploding number of data sources, collecting data in multiple data stores (cloud and on-premises), multiple analytical systems and at the requirements to be able to define, govern, manage and share trusted high quality information in a distributed and hybrid computing environment. It also explores a new approach of how IT data architects, business users and IT developers can collaborate together in building and managing a logical data lake to get control of your data. This includes data ingestion, automated data discovery, data profiling and tagging and publishing data in an information catalog. It also involves refining raw data to produce enterprise data services that can be published in a catalog available for consumption across your company. We also introduce multiple data lake configurations including a centralised data lake and a ‘logical’ distributed data lake as well as execution of jobs and governance across multiple data stores. It emphasises the need for a common collaborative approach to governing and managing data of all types.

Korzyści dla uczestnika

Attendees will learn:

• How to define a strategy for producing trusted data as-a-service in a distributed environment of multiple data stores and data sources
• How to organise data in a centralised or distributed data environment to overcome complexity and chaos
• How to design, build, manage and operate a logical or centralised data lake within their organisation
• The critical importance of an information catalog in understanding what data is available as a service
• How data standardisation and business glossaries can help make sure data is understood
• An operating model for effective distributed information governance
• What technologies and implementation methodologies they need to get their data under control.
• How to apply methodologies to get master and reference data, big data, data warehouse data and unstructured data under control irrespective of whether it be on-premises or in the cloud.

Uczestnicy

This seminar is intended for business data analysts doing self-service data integration, data architects, chief data officers, master data management professionals, content management professionals, database administrators, big data professionals, data integration developers, and compliance managers who are responsible for data management. This includes metadata management, data integration, data quality, master data management and enterprise content management. The seminar is not only for ‘Fortune 500 scale companies’ but for any organisation that has to deal with Big Data, small data, multiple data stores and multiple data sources. It assumes that you have an understanding of basic data management principles as well as a high level of understanding of the concepts of data migration, data replication, metadata, data warehousing, data modelling, data cleansing, etc.

Agenda warsztatów

9.00 – 10.00 STRATEGY & PLANNING

This session introduces the data lake together with the need for a data strategy and looks at the reasons why companies need it. It looks at what should be in your data strategy, the operating model needed to implement, the types of data you have to manage and the scope of implementation. It also looks at the policies and processes needed to bring your data under control.

• The ever increasing distributed data landscape
• The siloed approach to managing and governing data
• IT data integration, self-service data preparation or both? – data governance or data chaos?
• Key requirements for data management
- − Structured
- − Semi-structured data
- − Unstructured data
- − Re-usable services to manage data
• Dealing with new data sources – cloud data, sensor data, social media data, smart products (the internet of things)
• Understanding scope of your data lake
- − OLTP system sources
- − Data Warehouses
- − Big Data systems, e.g. Hadoop
- − MDM and RDM systems
- − Data virtualisation
- − Streaming data
- − Enterprise Content Management
• Building a business case for data management
• Defining an enterprise data strategy
• A new inclusive approach to governing and managing data
• Introducing the data lake and data refinery
• Data lake configurations – what are the options?
- − Centralised, distributed or logical
• The rising importance of an Information catalog
• Integrating a data lake into your enterprise analytical architecture

Więcej

Zwiń

10.00 – 11.00 METHODOLOGY & TECHNOLOGIES

Having understood strategy, this session looks at multiple methodologies and the technologies needed to help apply it to your structured and multi-structured data to bring it under control. It also looks at how platforms like Hadoop and common data services provide the foundation to manage information across the enterprise

• Information production and information consumption
• Data lake use cases
• The role of data management technology platforms, in managing data across multiple data stores
• A best practice step-by-step methodology structured data governance
• Why the methodology has to change for semi-structured and unstructured data
• Methodologies for structured Vs multi-structured data
• Technology components in the new world of distributed data
• Implementation run-time options – the need to execute in multiple environments

Więcej

Zwiń

11.00 – 11.15 Coffee break

11.15 – 12.15 DATA STANDARDISATION, THE BUSINESS GLOSSARY AND THE INFORMATION CATALOG

This session looks at the need for data standardisation of structured data and of new insights from processing unstructured data. The key to making this happen is to create common data names and definitions for your data to establish a shared business vocabulary (SBV). The SBV should be defined and stored in a business glossary and is important for information consumers to understand published data in a data lake. It also looks at the emergence of more powerful information catalog software and how business glossaries have become part of what a catalog offers

• Semantic data standardisation using a shared business vocabulary within an information catalog
• The role of an SBV in MDM, RDM, SOA, DW and data virtualisation
• Why is an SBV relevant in a data lake and a Logical Data Warehouse?
• Approaches to creating an SBV
• Business glossary products storing SBV business data names
• Formalising governance of business data names, e.g. the dispute resolution process
• Business involvement in SBV creation
• Beyond structured data – from business glossary to information catalog
• What is an Information Catalog?
• Why are information catalogs becoming critical to data management?
• Information catalog technologies

Więcej

Zwiń

12.15 – 13.15 ORGANISING AND OPERATING THE DATA LAKE

This session looks at how to organise data to still be able to manage it in a complex data landscape. It looks at zoning, versioning, the need for collaboration between business and IT and the use of an information catalog in managing the data

• Organising data in a centralised or distributed data lake
• Creating zone to manage data
• New requirements for managing data in centralised and distributed data lakes
• Creating collaborative data lake projects
• Hadoop as a staging area for enterprise data cleansing and integration
• Core processes in data lake operations
• The data ingestion process
• Tools and techniques for data ingestion
• Implementing systematic disparate data and data relationship discovery using Information catalog software
• Using domains and machine learning to automate and speed up data discovery and tagging
• Automated profiling and tagging and cataloguing of data
• Automated data mapping
• The data classification and policy definition processes
• Manual and automated data classification to enable governance
• Using tag based policies to govern data

Więcej

Zwiń

13.15 – 14.00 Lunch

14.00 -15.30 THE DATA REFINERY PROCESS

This session looks at the process of refining data to get produce trusted information

• What is a data refinery?
• Key requirements for refining data
• The need for multiple execution engines to run in multiple environments
• Options for refining data – ETL versus self-service data preparation
• Key approaches to scalable ETL data integration using Apache Spark
• Self-service data preparation tools for Spark and Hadoop
• Automated data profiling using analytics in data preparation tools
• Executing data refinery jobs in a distributed data lake using Apache Beam to run anywhere
• Approaches to integrating IT ETL and self-service data preparation
• Apache Atlas Open Metadata & Governance
• Joined up analytical processing from ETL to analytical workflows
• Publishing data and data integration jobs to the information catalog
• Mapping produced data of value into your DW and business vocabulary
• Data provisioning – provisioning consistent information into data warehouses, MDM systems, NoSQL DBMSs and transaction systems
• Provisioning consistent refined data using data virtualisation, a logical data warehouse and on-demand information services
• Governing the provisioning process using rules-based metadata

Więcej

Zwiń

15.30 – 17.00 INFORMATION AUDIT & PROTECTION – THE FORGOTTON SIDE OF DATA GOVERNANCE

Over recent years we have seen many major brands suffer embarrassing publicity due to data security breaches that have damaged their brand and reduced customer confidence. With data now highly distributed and so many technologies in place that offer audit and security, many organisations end up with a piecemeal approach to information audit and protection. Policies are everywhere with no single view of the policies associated with securing data across the enterprise. The number of administrators involved is often difficult to determine and regulatory compliance is now demanding that data is protected and that organisations can prove this to their auditors. So how are organisations dealing with this problem? Are the same data privacy policies enforced everywhere? How is data access security co-ordinated across portals, processes, applications and data? Is anyone auditing privileged user activity? This session defines this problem, looks at the requirements needed for Enterprise Data Audit and Protection and then looks at what technologies are available to help you integrate this into your data strategy

• What is Data Audit and Security and what is involved in managing it?
• Status check – Where are we in data audit, access security and protection today?
• What are the requirements for enterprise data audit, access security and protection?
• What needs to be considered when dealing with the data audit and security challenge?
• Automatic data discovery and the information catalog – a huge help in identifying sensitive data
• What about privileged users?
• Using a data management platform and information catalog to govern data across multiple data stores
• Securing and protecting data using tag based policies in an information catalog
• What technologies are available to protect data and govern it?
• Can these technologies help in GDPR?
• How do they integrate with Data Governance programs?
• How to get started in securing, auditing and protecting your data

Więcej

Zwiń

Prowadzący

Mike Ferguson

Dyrektor Zarządzający, Intelligent Business Strategies Ltd.

Warsztaty prowadzone będą w języku angielskim. Organizator nie przewiduje tłumaczenia.