2017 Australasian Database Conference

PhD School Tutorials:

Dr Divesh Srivastava, AT&T Labs-Research

Bio: Divesh Srivastava is the head of Database Research at AT&T Labs-Research. He is a Fellow of the Association for Computing Machinery (ACM) and the managing editor of the Proceedings of the VLDB Endowment (PVLDB). He has served as a trustee of the VLDB Endowment, as an associate editor of the ACM Transactions on Database Systems (TODS), as an associate Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (TKDE), and as a general or program committee co-chair of many conferences. He has presented keynote talks at several international conferences, and his research interests and publications span a variety of topics in data management. He received his Ph.D. from the University of Wisconsin, Madison, USA, and his Bachelor of Technology from the Indian Institute of Technology, Bombay, India.

Title: Information theory for data management

Abstract: This tutorial explores the use of information theory as a tool to express and quantify notions of information content and information transfer for representing and analyzing data. We do so in an application-driven way, using a variety of data management applications, including database design, data integration and data anonymization.

Dr Gianluca Demartini, The University of Sheffield

Bio: Dr. Gianluca Demartini is a Senior Lecturer in Data Science at the University of Sheffield, Information School. His research is currently supported by the UK Engineering and Physical Sciences Research Council (EPSRC) and by the EU H2020 framework program. His main research interests are Information Retrieval, Semantic Web, and Human Computation. He received the Best Paper Award at the European Conference on Information Retrieval (ECIR) in 2016 and the Best Demo Award at the International Semantic Web Conference (ISWC) in 2011. He has published more than 70 peer-reviewed scientific publications including papers at major venues such as WWW, ACM SIGIR, VLDBJ, ISWC, and ACM CHI. He has given several invited talks, tutorials, and keynotes at a number of academic conferences (e.g., ISWC, ICWSM, WebScience, and the RuSSIR Summer School), companies (e.g., Facebook), and Dagstuhl seminars. He is an ACM Distinguished Speaker since 2015. He serves as area editor for the Journal of Web Semantics, as Student Coordinator for ISWC 2017, and as Senior Program Committee member for the AAAI Conference on Human Computation and Crowdsourcing (HCOMP), the International Conference on Web Engineering (ICWE), and the ACM International Conference on Information and Knowledge Management (CIKM). He is Program Committee member for several conferences including WWW, SIGIR, KDD, IJCAI, ISWC, and ICWSM. He was co-chair for the Human Computation and Crowdsourcing Track at ESWC 2015. He co-organized the Entity Ranking Track at the Initiative for the Evaluation of XML Retrieval in 2008 and 2009. Before joining the University of Sheffield, he was post-doctoral researcher at the eXascale Infolab at the University of Fribourg in Switzerland, visiting researcher at UC Berkeley, junior researcher at the L3S Research Center in Germany, and intern at Yahoo! Research in Spain. In 2011, he obtained a Ph.D. in Computer Science at the Leibniz University of Hanover focusing on Semantic Search.

Title: Crowdsourcing for Data Management

Abstract: In this session we will introduce the concept of micro-task crowdsourcing and human computation presenting examples of hybrid human-machine systems for entity linking, data integration and search. Such systems are examples of how the use of human intelligence at scale in combination with machine-based algorithms can outperform traditional data management systems. In this context, we will then discuss efficiency and effectiveness challenges of micro-task crowdsourcing platforms including spam, quality control, task assignment models, and job scheduling.

Prof Rui Zhang, The University of Melbourne

Bio: Dr Rui Zhang obtained his Bachelor's degree from Tsinghua University in 2001 and PhD from National University of Singapore in 2006. Before joining the University of Melbourne, he has been a visiting research scientist at AT&T labs-research in New Jersey and at Microsoft Research in Redmond, Washington. Since January 2007, he has been a faculty member in the Department of Computing and Information Systems at The University of Melbourne. Recently, he has been a visiting researcher at Microsoft Research Asia in Beijing regularly collaborating on his ARC Future Fellowship project. Dr Zhang's research interest is data and information management in general, particularly in areas of high-performance computing, spatial and temporal data analytics, moving object management, indexing techniques, data streams and sequence databases. His inventions have been adopted by major IT companies such as AT&T and Microsoft. In 2015, Dr Zhang has received the Chris Wallace Award by the Computing Research and Education Association of Australasia (CORE) for Outstanding Research in recognition of his significant contributions to the management and mining of spatiotemporal and multidimensional data. Please see representative projects Dr Zhang is leading on the page of Spatial and Temporal Data Analytics.

Title: Contextual Intent Tracking for Personal Assistants

Abstract: A new paradigm of recommendation is emerging in intelligent personal assistants such as Apple's Siri, Google Now, and Microsoft Cortana, which recommends "the right information at the right time" and proactively helps you "get things done". This type of recommendation requires precisely tracking users' contemporaneous intent, i.e., what type of information (e.g., weather, stock prices) users currently intend to know, and what tasks (e.g., playing music, getting taxis) they intend to do. Users' intent is closely related to context, which includes both external environments such as time and location, and users' internal activities that can be sensed by personal assistants. The relationship between context and intent exhibits complicated co-occurring and sequential correlation, and contextual signals are also heterogeneous and sparse, which makes modeling the context-intent relationship a challenging task. To solve the intent tracking problem, we propose the Kalman filter regularized PARAFAC2 (KP2) nowcasting model, which compactly represents the structure and co-movement of context and intent. The KP2 model utilizes collaborative capabilities among users, and learns for each user a personalized dynamic system that enables efficient nowcasting of users' intent. Extensive experiments using real-world data sets from a commercial personal assistant show that the KP2 model significantly outperforms various methods, and provides inspiring implications for deploying large-scale proactive recommendation systems in personal assistants.

Dr Junhao Gan, The University of Queensland

Bio: Dr. Junhao Gan currently is a Post-Doctoral Research Fellow in the School of Information Technology and Electrical Engineering (ITEE) at the University of Queensland (UQ). Before starting the postdoc appointment, he graduated as a PhD supervised by Prof. Yufei Tao in the School of ITEE at UQ in 2017. He obtained his bachelor and master degrees from School of Software, Sun Yat-Sen University, in 2011 and 2013, respectively. His research interests are to design practical algorithms with non-trivial theoretical guarantees for solving problems on massive data. He has published several papers at SIGMOD and TODS. He also won the Best-Paper Award at SIGMOD 2015.

Title: Euclidean DBSCAN Revisited: From Static to Dynamic

Abstract: DBSCAN is a highly successful density-based clustering method for multi-dimensional points. Its seminal paper won the Test-of-Time Award at KDD 2014, and it has over 9000 citations at Google Scholar. Although DBSCAN has received extensive applications, its computational hardness was unsolved until the recent work at SIGMOD 2015. This talk focuses on the problem of computing DBSCAN clusters on a set of n points in d-dimensional space from scratch (assuming no existing indexes) under the Euclidean distance. More specifically, we first show the DBSCAN problem is “hard” in three or higher dimensional space. Motivated by this, we propose a relaxed version of the problem called ρ-approximate DBSCAN, which returns the same clusters as DBSCAN, unless the clusters are “unstable”. The ρ-approximate problem is “easy” regardless of the constant dimensionality. This talk further discusses the algorithmic principles for dynamic clustering by DBSCAN. Surprisingly, we prove that the ρ-approximate version suffers from the very same hardness when the dataset is fully dynamic. We also show that this issue goes away as soon as tiny further relaxation is applied, yet still ensuring the same quality of ρ-approximate DBSCAN.

Dr Wen Hua, The University of Queensland

Bio: Dr Wen Hua is a Lecturer at the School of Information Technology and Electrical Engineering (ITEE), the University of Queensland. She received her PhD and bachelor degrees in Computer Science from Renmin University of China in 2015 and 2010, respectively. After completing her PhD study, she was appointed as a Postdoctoral Research Fellow at the University of Queensland. Her research interests include sensor data analytics, information extraction, data mining, and social media analysis. She has published papers as the main author in reputed journals and internal conferences such as SIGMOD, PVLDB, ICDE, TKDE, IJCAI, CIKM, WSDM, WWWJ, etc. She won the Best Paper Award in ICDE 2015, and she was also awarded the Advance Queensland Research Fellowship in 2017.

Title: Big Data Meets the Microgrid: Challenges and Opportunities

Abstract: A microgrid is a discrete energy system consisting of distributed energy sources (including demand management, storage, and generation) and loads capable of operating in parallel with, or independently from, the main power grid. It paves a way to effectively integrate various sources of distributed generation, especially Renewable Energy Sources (RES), and meanwhile provides a good solution for supplying power in case of an emergency by having the ability to change between islanded mode and grid-connected mode. On the other hand, control and protection are big challenges in this type of network configuration, which draws our attention to utilize data analytics techniques for enabling smarter control in the microgrid ecosystem. With an extensive range of sensors installed in the microgrid, it has been continuously generating streaming machine-to-machine (M2M) data to support dynamic optimization of power generation, consumption and storage. Microgrids bring significant new challenges and opportunities to data management and data analytics, from data acquisition, data quality control, data compression, data fusion, data disaggregation, data mining, and data prediction. In this talk we will introduce this emerging area to the data management community, with an emphasis on the challenges and some promising research topics on large scale mcirogrid data analytics.