研究内容
-
多副本控制:利用着色Petri网、排队论对分布式存储系统中的消息传播、副本控制等内容进行建模,分析多副本存储中,一致性控制协议的灵敏度、有效性、高效性;分析系统中排队现象、研究Gossip协议改进、副本修复策略、副本固有一致性优化、非固有一致性配置、对副本控制协议进行优化。
-
查询引擎:研究NoSQL数据库与传统关系数据库的一体化查询语言和引擎,支持向量索引、量化索引等非关系型数据的查询执行器,研究多副本下的查询优化策略。
-
分析引擎:研究统计分析与机器学习的分布式计算框架(Spark/MapReduce等),利用多副本存储、多视图访问、多任务调度、弹性一致性控制、在线动态更新等策略加速计算效率和通信效率;面向大数据4V特性研究多数据源学习、多模态学习、多任务学习、深度学习、迁移学习等共性分析算法;研究分析模型的在线管理、维护、服务,利用自适应学习等机器智能方法实现历史模型到在线数据与任务的迁移适配。
-
运行时优化:通过管理并分析大数据平台各组件运行时产生的日志数据、监控数据、众包数据等信息,利用最优化理论与统计学习方法,实现组件配置参数的自动优化。将机器智能方法与软件工程方法紧密结合起来,为打造自适应软件系统探明方向。
代表性研究成果
- Mingsheng Long, Jianmin Wang, Yue Cao. Learning Transferable Features with Deep Adaptation Networks. ArXiv 2015. (To Appear)
- Mingsheng Long, Jianmin Wang, Jiaguang Sun, Philip S. Yu. Domain Invariant Transfer Kernel Learning. IEEE Transactions on Knowledge and Data Engineering, TKDE 99: 1-14 (2015)
- Mingsheng Long, Jianmin Wang, Guiguang Ding, et al. Transfer Learning with Graph Co-Regularization. IEEE Transactions on Knowledge and Data Engineering, TKDE 26(7): 1805-1818 (2014)
- Mingsheng Long, Jianmin Wang, Guiguang Ding, et al. Adaptation Regularization: A General Framework for Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, TKDE 26(5): 1076-1089 (2014)
- Xiangdong Huang, Jianmin Wang, Jian Bai, et al. Inherent Replica Inconsistency in Cassandra. IEEE International Conference on Big Data, BigData 2014: 740-747
- Yuqing Zhu, Philip S. Yu, Jianmin Wang. RECODS: Replica consistency-on-demand store. IEEE International Conference on Data Engineering, ICDE 2013: 1360-1363
- Yuqing Zhu, Philip S. Yu, Jianmin Wang. Latency Bounding by Trading off Consistency in NoSQL Store: A Staging and Stepwise Approach. CoRR abs/1212.1046 (2012)
- Yuqing Zhu, Jianmin Wang. Client-centric consistency formalization and verification for system with large-scale distributed data storage. Future Generation Comp. Syst. 26(8): 1180-1188 (2010)
- 分布式计算机数据存储系统中多副本一致性的控制方法, 王建民, 黄向东, CN201410165580.X
研究内容
- 事件数据质量控制:针对多源事件日志中存在的事件丢失、乱序、标签错误、结构错误问题,研究事件日志中每条轨迹在正确流程模型指导下的高效过滤和修复机制;针对动态、多变的流程执行环境引发的流程变更导致的事件轨迹与流程模型不符问题,研究基于正确事件日志对原始流程模型进行最小代价的自动修复技术。该方面研究还包括基于速度约束的流数据清理技术等。
- 事件数据集成与管理:针对来自各行各业、急剧增长的具有流程特性的事件序列数据和流程模型,研究多源、异构、海量事件日志(亦称流程实例日志)和流程模型的高效集成、统一存储、特征提取、相似性计算、差异性计算、分类聚类、多维索引和综合检索等关键处理技术,并结合大数据处理平台,研究上述技术的分布式并行加速算法,为事件数据的管理和分析再利用奠定坚实的应用基础。
-
流程数据分析与挖掘:开发把流程挖掘问题分解为多个分布到计算机集群的较小挖掘问题的并行化技术(T1);针对无法存储极长一段时间内全部事件的应用,开发无需存储所有事件就能够增量学习流程模型的即时流程挖掘技术(T2);开发能够系统地突出共性和差异的可比较流程挖掘技术,以便能够处理随着时间发生改变而且有很多变种的异质流程(T3)。
研究成果
- Jianmin Wang, Shaoxu Song, Xuemin Lin, Xiaochen Zhu, Jian Pei. Cleaning Structured Event Logs: A Graph Repair Approach. IEEE International Conference on Data Engineering, ICDE 2015
- Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu. SCREEN: Stream Data Cleaning under Speed Constraints. ACM SIGMOD International Conference on Management of Data, SIGMOD 2015
- Xiaochen Zhu, Shaoxu Song, Xiang Lian, Jianmin Wang, Lei Zou. Matching Heterogeneous Event Data. ACM SIGMOD International Conference on Management of Data, SIGMOD 2014: 1211-1222
- Xiaochen Zhu, Shaoxu Song, Jianmin Wang, Philip S. Yu, Jiaguang Sun. Matching Heterogeneous Events with Patterns. IEEE International Conference on Data Engineering, ICDE 2014: 376-387
- Tao Jin, Jianmin Wang, Yun Yang, Lijie Wen, Keqin Li. Refactor Business Process Models with Maximized Parallelism. IEEE Transactions on Services Computing, 2014
- Tao Jin, Jianmin Wang, Lijie Wen, Gen Zou. Computing Refined Ordering Relations with Uncertainty for Acyclic Process Models. IEEE Transactions on Services Computing, 2014
- Jianmin Wang, Tao Jin, Raymond K. Wong, Lijie Wen. Querying business process model repositories - A survey of current approaches and issues. World Wide Web, 2014
- Hedong Yang, Lijie Wen, Jianmin Wang, Raymond K. Wong. CPL+: An improved approach for evaluating the local completeness of event logs. Information Processing Letters, 2014
- Jianmin Wang, Shaoxu Song, Xiaochen Zhu, Xuemin Lin. Efficient Recovery of Missing Events. Proceedings of the VLDB Endowment, PVLDB 6(10): 841-852 (2013)
- Tao Jin, Jianmin Wang, Marcello La Rosa, Arthur H. M. ter Hofstede, Lijie Wen. Efficient querying of large process model repositories. Computers in Industry, 2013
- Jianmin Wang, Raymond K. Wong, Jianwei Ding, Qinlong Guo, Lijie Wen. Efficient Selection of Process Mining Algorithms. IEEE Transactions on Services Computing, 2013
- Liang Song, Jianmin Wang, Lijie Wen, Hui Kong. Efficient Semantics-Based Compliance Checking Using LTL Formulae and Unfolding. Journal of Applied Mathematics, 2013
- Zhaoxia Wang, Jianmin Wang, Xiaochen Zhu, Lijie Wen. Verification of workflow nets with transition conditions. Journal of Zhejiang University - Science C, 2012
- Haiping Zha, Wil M. P. van der Aalst, Jianmin Wang, Lijie Wen, Jiaguang Sun. Verifying workflow processes: a transformation-based approach. Software and System Modeling, 2011
- Haiping Zha, Jianmin Wang, Lijie Wen, Chaokun Wang, Jiaguang Sun. A workflow net similarity measure based on transition adjacency relations. Computers in Industry, 2010
- Lijie Wen, Jianmin Wang, Wil M. P. van der Aalst, Biqing Huang, Jiaguang Sun. Mining process models with prime invisible tasks. Data & Knowledge Engineering, 2010
- Lijie Wen, Jianmin Wang, Wil M. P. van der Aalst, Biqing Huang, Jiaguang Sun. A novel approach for process mining based on event types. Journal of Intelligent Information Systems, 2009
- Lijie Wen, Wil M. P. van der Aalst, Jianmin Wang, Jiaguang Sun. Mining process models with non-free-choice constructs. Data Mining and Knowledge Discovery, 2007
- 殷明; 闻立杰; 王建民; 查海平; 刘英博; 董子禾. 一种器械设备的工作状态检测方法,2014/05/27. 清华大学,专利,申请号:201410225173.3
- 流程数据管理与分析挖掘软件 V1.0,清华大学,软件著作权,登记号:2014SR190808
- 流程模式及片段优化分析工具软件 V1.0,清华大学,软件著作权,登记号:2014SR190823
- 在线数据流程管理平台 V1.0,清华大学,软件著作权,登记号:2014SR190811
- 流程数据管理与分析挖掘软件 V1.0,清华大学,软件著作权,登记号:2014SR190808
研究内容
- 基于内容的实例检索技术:通过给定查询实例图像(如汽车、商品 logo 或建筑等),在海量视频集中使用基于视觉特征的方法进行快速、准确的自动实例查找,相关研究涉及到视觉特征抽取、BoW词典训练、高维视觉索引等技术,涉及机器学习、模式识别、多媒体检索等诸多学科,目前属于较前沿的方向。
- 多媒体语义分析技术:结合图像或视频的语义属性和矩阵分解、稀疏编码等多种机器学习技术,研究如何提取鲁棒性好、泛化能力强、区分度高的属性,从而识别出图像或视频的高层语义信息,实现多媒体数据的语义识别。在高层语义信息的基础上,重点开展对象检测与识别、视频事件检测等应用研究。
- 基于属性的人的检索与识别:利用人脸检测及人体区域分割技术,研究人的属性识别方法,在多属性查询过程中,结合极值理论、Weibull分布以及属性相关度的学习算法,提升属性融合的准确性,进而得到更准确的检索与识别结果。
科研项目
- 国家自然科学基金项目:数据驱动的大规模图像自动标注关键技术研究
- 国家自然科学基金项目:大规模视频数据的拷贝检测关键技术研究
- 横向课题:海量音视频处理及检索系统
- 横向课题:多媒体大数据检索分析关键技术
- 横向课题:大数据管理技术
科研奖励
- 2013年,江苏省科学技术三等奖,基于云计算的海量物联网数据存储与处理关键技术与平台的研发与产业化。
- 2010年,国家科学技术进步二等奖,面向大规模城域监控的流媒体关键技术及装备。
- 2010年,中国电子学会电子信息科学技术三等奖,视频监控实时行为分析关键技术及应用。
- 2007年,广东省科学技术二等奖,IPTV综合服务系统。
研究成果
- Jile Zhou, Guiguang Ding, Yuchen Guo, Qiang Liu, XinPeng Dong, Kernel-Based Supervised Hashing for Cross-View Similarity Search, ICME 2014
- Zijia Lin, Guiguang Ding, Mingqing Hu, Jianmin Wang, Multi-label Classification via Feature-aware Implicit Label Space Encoding, ICML 2014
- Zijia Lin, Guiguang Ding, Mingqing Hu, Yunzhen Lin, Shuzhi Sam Ge, Image Tag Completion via Dual-view Linear Sparse Reconstructions, CVIU 2014
- Jile Zhou, Guiguang Ding, Yuchen Guo, Latent Semantic Sparse Hashing for Cross-Modal Similarity Search, SIGIR 2014.
- Mingsheng Long, Jianmin Wang, Guiguang Ding, Philip Yu, Transfer Joint Matching for Visual Domain Adaptation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014).
- Guiguang Ding, Yuchen Guo, Jile Zhou, Collective Matrix Factorization Hashing for Multimodal Data, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014).
- Zijia Lin, Guiguang Ding, Mingqing Hu, Image Auto-annotation via Tag-dependent Random Search over Range-constrained Visual Neighbours, Multimedia Tools and Applications (2014).
- Zijia Lin, Guiguang Ding, Mingqing Hu, Multi-source Image Auto-annotation, ICIP 2013: 2567-2571 (Oral, Top 10% Paper)
- Z. Lin, G. Ding, M. Hu, et al. Image Tag Completion via Image-Specific Linear Sparse Reconstructions, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013).
- M.Long, G. Ding, J. Wang, Philip Yu, Transfer Sparse Coding for Robust Image Representation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013).
- J Shi, M Long, Q Liu, G Ding, J Wang, Twin Bridge Transfer Learning for Sparse Collaborative Filtering, Advances in Knowledge Discovery and Data Mining, 496-507
- W Zhang, G Ding, L Chen, C Li, C Zhang, Generating virtual ratings from chinese reviews to augment online recommendations, ACM Transactions on Intelligent Systems and Technology (TIST) 4 (1), 9.
- Z. Lin, G. Ding, M. Hu, J. Wang, J. Sun, Automatic image annotation using tag-related random search over visual neighbors, In Proceedings of the 21st ACM international conference on Information and knowledge management, CIKM '12.
- M. Long, J. Wang, G. Ding, D.Shen, Q. Yang, Transfer Learning with Graph Co-Regularization, In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI '12, pp.1033-1039.
- M Long, J Wang, G Ding, W Cheng, X Zhang, W Wang, Dual transfer learning, 12th SIAM International Conference on Data Mining (SDM 2012).
- 丁贵广; 林梓佳 ,基于标签图模型随机游走的图像自动标注方法及装置 ,2011/12/28, 11, (中华人民共和国国家专利局, CN201110147140.8 专利
- 丁贵广; 林梓佳 ,基于有向图非等概率随机搜索的图像自动标注方法及装置 ,2011/12/28, 11, (中华人民共和国国家专利局, CN201110147033.5 专利
- 丁贵广; 林梓佳; 文海龙; 王建民 ,基于对等结构的分布式高维索引并行查询框架 ,2012/8/1, 11, (中华人民共和国国家专利局, CN201210038115.0 专利
研究内容
- 面向互联网数据的机器学习技术:结合互联网数据的非结构化、动态性、海量性等特点,对相应的机器学习技术与数据挖掘技术进行研究,包括主题模型、推荐技术、深度学习、聚类方法等。
- 互联网信息的分析与挖掘:基于机器学习与数据挖掘技术,面向互联网文本等非结构化数据,研究互联网信息的查询检索方法、信息传播与演化模型、观点分析与情感挖掘方法等。
- 复杂社会数据建模与分析:研究如何提取社交网络的结构特征和实例特征,自动构建网络元素之间的潜在关联,发现全局的网络生成与演化模式。针对复杂社会网络以及音乐等复杂社会媒体数据进行有效数据建模、算子实现、查询处理、交互融合和分析推荐,有效支持复杂异构社会媒体数据集成。
研究成果
- C. Wan, X. Jin, G. Ding, D. Shen. Gaussian Cardinality Restricted Boltzmann Machines. Proc. 29th AAAI Conf. on Artificial Intelligence (AAAI). 2015.
- Jun Zhang, Chaokun Wang, Jianmin Wang, and Jeffrey Xu Yu. Inferring Continuous Dynamic Social Influence and Personal Preference for Temporal Behavior Prediction. PVLDB 2014
- Jun Zhang, Chaokun Wang, and Jianmin Wang. Who Proposed the Relationship? --- Recovering the Hidden Directions of Undirected Social Networks. WWW 2014
- Jun Chen, Chaokun Wang, and Jianmin Wang. Modeling the Interest-Forgetting Curve for Music Recommendation. ACM Multimedia 2014
- Jun Zhang, Chaokun Wang, and Jianmin Wang. Learning Temporal Dynamics of Behavior Propagation in Social Networks. AAAI 2014
- Jun Chen, Chaokun Wang, Lei Yang, Qingfu Wen, and Xu Wang. MiSCon: A Hot Plugging Tool for Real-time Motion-based System Control. ACM Multimedia 2014
- Raymond Y. K. Lau, Chunping Li, Stephen S. Y. Liao: Social analytics: Learning fuzzy product ontologies for aspect-oriented sentiment analysis. Decision Support Systems 65: 80-94, 2014
- Wenping Zhang, Raymond Y.K. Lau, Chunping Li, Adaptive Big Data Analytics for Deceptive Review Detection in Online Social Media, Proceedings of International Conference on Information System(ICIS), 2014
- W. Cheng, X. Jin, J. Sun, X. Lin, X. Zhang, W. Wang. Searching Dimension Incomplete Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE). 26(3): 725-738, 2014.
- Jun Zhang, Chaokun Wang, Yuanchi Ning, Yichi Liu, Jianmin Wang, and Philip Yu. LaFT-Explorer: Inferring, Visualizing and Predicting How Your Social Network Expands. ACM SIGKDD 2013.
- Yiyuan Bai, Chaokun Wang, Yuanchi Ning, Hanzhao Wu, and Hao Wang. G-Path: Flexible Path Pattern Query on Large Graphs. WWW 2013.
- Jun Zhang, Chaokun Wang, Philip Yu, and Jianmin Wang. Learning Latent Friendship Propagation Networks with Interest Awareness for Link Prediction. ACM SIGIR 2013.
- X. Ding, X. Jin, Y. Li, L. Li. Celebrity Recommendation with Collaborative Social Topic Regression. Proc. 23th Intl. Joint Conf. on Artificial Intelligence (IJCAI). 2013.
- Tong Zhao, Chunping Li, Mengya Li, Social Recommendation Incorporating Topic Mining and Social Trust Analysis, In Proceedings of ACM CIKM 2013: 1643-1648
- Yajie Miao, Chunping Li, Jie Tang, Lili Zhao: Identifying new categories in community question answering archives: a topic modeling approach. In Proceedings of ACM CIKM 2010: 1673-1676
- Ying Liu, Hui Zhang, Chunping Li, Roger Jianxin Jiao: Workflow simulation for operational decision support using event graph through process mining. Decision Support Systems 52(3): 685-697 (2012)
- L. Li, X. Jin, S. Pan, J. Sun. Multi-Domain Active Learning for Text Classification. Proc. 18th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD). 2012.
- L. Li, X. Jin, M. Long. Topic Correlation Analysis for Cross-Domain Text Classification. Proc. 26th AAAI Conf. on Artificial Intelligence (AAAI). 2012.
- X. Wang, X. Jin, M. Chen, K. Zhang, D. Shen. Topic mining over asynchronous text sequences. IEEE Transactions on Knowledge and Data Engineering (TKDE), 24(1), 2012.
- M. Chen, X. Jin, D. shen. Short Text Classification Improved by Learning Multi-Granularity Topics. Proc. 22th Intl. Joint Conf. on Artificial Intelligence (IJCAI). 2011.
- Zhang Liu, Chaokun Wang, Yiyuan Bai, Hao Wang, and Jianmin Wang. MUSIZ: A Generic Framework for Music Resizing with Stretching and Cropping. ACM Multimedia 2011.
- Yajie Miao, Chunping Li, Jie Tang, Lili Zhao: Identifying new categories in community question answering archives: a topic modeling approach. In Proceedings of ACM CIKM 2010: 1673-1676
- Peng Zou, Chaokun Wang, Zhang Liu, Jianmin Wang, and Jia-Guang Sun. A Cloud based SIM DRM Scheme for the Mobile Internet. ACM CCS 2010.
- Chaokun Wang, Jianmin Wang, Xuemin Lin, Wei Wang, Haixun Wang, Hongsong Li, Wanpeng Tian, Jun Xu, and Rui Li. MapDupReducer: Detecting Near Duplicates over Massive Datasets ACM SIGMOD 2010.
- Zhang Liu, Chaokun Wang, Jianmin Wang, Wei Zheng, and Shengfei Shi. Structure-Aware Music Resizing Using Lyrics. WWW 2010.
- Y. Zhang, X. Jin. Concept Sampling: Towards Systematic Selection in Large-Scale Mixed Concepts in Machine Learning. Proc. 20th Intl. Joint Conf. on Artificial Intelligence (IJCAI). 2007.
- X. Jin, X. Zuo, K. Lam, J. Wang, J. Sun. Efficient Discovery of Emerging Frequent Patterns in Arbitrary Windows on Data Streams. Proc. 22nd Intl. Conf. on Data Engineering (ICDE). 2006.
- X. Jin, Y. Lu, C. Shi. Similarity Measure Based on Partial Information of Time series. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD). 2002.
- 李春平,高松,王益斌,顾明,古川和年,阿部昌平,检测信息传播的网页关系评价装置,ZL 200910092356.1,中国/日本授权专利
- 李春平,王益斌,阿部昌平,搜索辅助方法及搜索辅助程序,ZL 201010140447.0,中国/日本授权专利
研究内容
(一)、基于自由表的工业状态监测大数据存储框架
基于自由表的工业状态监测大数据存储框架是为了满足三一重工工业状态监测需求而定制开发的专用存储系统,该系统在状态检测与运维服务支持系统中的定位是提供大容量、稳定、可靠的原始监测数据存储,并为监测分析子系统提供查询和分析支持。
(二)、基于自由表的监测数据组织方法
三一重工已有的监测系统数据存储在Oracle集群中,以传统关系型数据库的表结构模型来组织数据,该模型的限制是表的列数有限,且难以添加或删除列,由于表数据量过大,也带来了查询速度慢、一些统计分析功能无法实现等问题。基于自由表的数据模型将原有的单一长表的结构转化为多个宽表的结构,降低了查询数据的复杂度,提高了性能,并且使分析统计操作成为可能。
(三)、基于大数据的行为分析
通过大数据技术,针对大量设备同时运行过程中存在的实时性分析和大量离线数据分析问题,提出专用的指标分析库和支持工业大数据的正交分割分布式计算方法,实现百亿级数据的快速运算。
在此基础上,将大量的产品行为合并在一起进行横向对比分析。通过对数据进行并行计算,可以快速了解正在服役的产品中的行为情况。对于异常活跃的行为和异常迟缓的行为则作为关注的重点,而对于严重异常的行为则可以主动干预,防止情况进一步恶化。
研究内容
(一)、目标驱动的口腔健康数据抽取方法研究
健康服务从根本上讲是一种知识密集型服务。当前医疗服务过程中不断生成和消费大量的数据,孤立的数据存储,伴随着井喷式的数据增长, 造成信息管理和集成成本激增。我们针对分散于不同数据库的孤立数据,研究目标驱动的医疗健康信息和知识的获取、存储、查询、使用、更新技术、方法和支撑平台。最终采用新的数据分析技术将这些数据为医疗机构和患者转化为可用的知识和决策的证据,从而产生新的价值。
(二)、健康数据安全与隐私保护技术研究
健康数据的安全以及患者的隐私保护是国内外在健康领域广受关注和热议的话题。为此,美国专门制定了健康信息隐私保护法(HIPAA)要求医疗机构的信息系统依从该法律的规定方可投入使用。当今电子病历和健康档案日益推广普及,对患者个人数据的搜集和访问也应更加审慎,需要建立坚固有效的信息安全防御措施,保护患者的隐私。对个人数据的访问须经严格的访问控制与授权,教学与科研数据以及其他群组数据的统计与分析须去个体身份特征后使用。我们研发了以患者为中心的、移动的、可共享、可交换的口腔健康数据服务系统,经患者授权的情况下实现数据共享,确保当前需求而设计的软硬件平台能够应对未来口腔医疗信息共享与应用服务的需求。
(三)、移动医疗平台与应用研发
在医疗领域,移动设备(智能手机与平板电脑)的应用成为最新的流行趋势,每天都有新的移动健康应用上线,对用户的健康信息消费行为和就医流程的效率有显著的影响。医生通过高可用、可移植、方便的移动服务,改进工作流程,提高软件应用和信息资产的使用效率。移动设备带来的好处是医生可以随时随地提供服务,是一种关键的、高效率、低成本的健康服务使能工具。医生可以随时随地监控患者状态,患者可以随时获得医生的医疗诊断建议、指导其治疗行为、生活习惯、日常健康促进活动等。