会议日程
2024-12-14 (Day 1) | |
8:45 - 9:00 | 开幕:大会主席、程序委员会主席 |
9:00 - 9:10 | ChinaSys新星奖和优秀博士论文奖颁奖 | Keynote I (Session Chair: 章明星, 清华大学) |
9:10 - 9:50 | 高性能安全并发的工业实践 讲者:付明(华为) |
9:50 - 10:30 | 面向视觉多模态模型的专用AI处理器芯片架构设计 讲者:梁晓峣(上海交通大学) |
茶歇 (10:30 - 10:40) | Session 1: Serverless (Session Chair: 庞浦, 上海交通大学) |
10:40 - 10:57 | Derm: SLA-aware Resource Management for Highly Dynamic Microservices 讲者:陈钌 (澳门大学) |
10:57 - 11:14 | TrEnv: Transparently Share Serverless Execution Environments Across Different Functions and Nodes 讲者:黄嘉良 (清华大学, 阿里巴巴) |
11:15 - 12:00 | Lightning Talk I |
午休 & Poster Session (12:00 - 13:30) | |
Session 2: Operating Systems (Session Chair: 杜冬冬, 上海交通大学) |
|
13:30 - 13:47 | Understanding the Linux Kernel, Visually 讲者:刘瀚之 (南京大学) |
13:47 - 14:04 | Skyloft: A General High-Efficient Scheduling Framework in User Space 讲者:田凯夫 (清华大学) |
14:04 - 14:21 | Fast Core Scheduling with Userspace Process Abstraction 讲者:林家桢 (清华大学) |
14:21 - 14:38 | BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKS 讲者:郭迎港 (南京大学) |
Session 3: Storage Systems (Session Chair: 张峰, 中国人民大学) |
|
14:38 - 14:55 | CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory 讲者:罗旭川 (复旦大学) |
14:55 - 15:12 | Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated Memory 讲者:杨涵章 (上海交通大学) |
15:12 - 15:29 | NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering 讲者:周哲 (华为) |
15:29 - 15:46 | Mirage: Generating Enormous Databases for Complex Workloads 讲者:黄煦华 (华东师范大学) |
茶歇 (15:46 - 16:00) | |
Session 4: Machine Learning Systems (Session Chair: 张钊宁, 国防科技大学) |
|
16:00 - 16:17 | Parrot: Efficient Serving of LLM-based Applications with Semantic Variable 讲者:林超凡 (清华大学) |
16:17 - 16:34 | Llumnix: Dynamic Scheduling for Large Language Model Serving 讲者:赵汉宇 (阿里巴巴) |
16:34 - 16:51 | Heterogeneous Collaborative Speculative Decoding: An Acceleration Method for LLM on Personal Devices 讲者:张立博 (国防科技大学) |
16:51 - 17:08 | In-Storage Attention Offloading for Efficient Long-Context LLM Inference 讲者:李恩典 (北京大学) |
Session 5: Industry (Session Chair: 马腾, 阿里巴巴) |
|
17:08 - 17:25 | OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster 讲者:徐泉清 (蚂蚁集团) |
17:25 - 17:42 | 异构融合OS-异构智算时代的操作系统创新与挑战 讲者:林飞龙 (华为) |
17:42 - 17:59 | GraphUniverse数据管理分析新范式 讲者:林恒 (蚂蚁集团) |
Poster Session (18:00 - 18:30) | |
晚宴 (18:30 东凯悦酒店二楼宴会厅) |
2024-12-15 (Day 2) | Keynote II (Session Chair: 宋卓然, 上海交通大学) |
9:00 - 9:40 | 低开销可逆缓存一致性协议 讲者:钱学海(清华大学) |
9:40 - 10:20 | 赋能基础大模型的边缘设备部署 讲者:曹婷(微软亚洲研究院) |
最佳展示奖颁奖 |
|
茶歇 (10:20 - 10:30) | Session 6: Best Paper (Session Chair: 宋新开, 中科院计算所) |
10:30 - 10:47 | Serialization/Deserialization-free State Transfer in Serverless Workflows 讲者:魏星达 (上海交通大学) |
10:47 - 11:04 | AmgT: Algebraic Multigrid Solver on Tensor Cores 讲者:曾礼杰 (中国石油大学(北京)) |
11:04 - 11:21 | PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared Memory 讲者:章颖强 (阿里云) |
11:21 - 12:00 | Lightning Talk II |
午休 (12:00 - 13:30) | |
Session 7: Computer Architecture (Session Chair: 周哲, 华为) |
|
13:30 - 13:47 | Constructing Block-Interface All-Flash Array with Zoned-Namespace SSDs 讲者:彭力 (北京大学) |
13:47 - 14:04 | ChameleonEC: Exploiting Tunability of Erasure Coding for Low-Interference Repair 讲者:蔡煜晖 (厦门大学) |
14:04 - 14:21 | A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAs 讲者:王庆刚 (华中科技大学) |
14:21 - 14:38 | Gaze into the Pattern: Characterizing Spatial Patterns with Footprint-Internal Correlations for Hardware Prefetching 讲者:陈子啸 (上海交通大学) |
Session 8: High-Performance Computing (Session Chair: 甘一鸣, 中科院计算所) |
|
14:38 - 14:55 | Exploring Hierarchical Patterns for Alert Aggregation in Supercomputers 讲者:孙永谦 (南开大学) |
14:55 - 15:12 | Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators 讲者:蔡经纬 (清华大学) |
15:12 - 15:29 | Towards Highly Compatible I/O-aware Workflow Scheduling on HPC Systems 讲者:唐宇 (国防科技大学) |
15:29 - 15:46 | RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator 讲者:张杰 (浙江大学) |
茶歇 (15:46 - 16:00) | |
Session 9: GPU (Session Chair: 周迪宇, 北京大学) |
|
16:00 - 16:17 | Efficient 4-bit Matrix Unit via Primitivization 讲者:陈亦 (中科院计算所) |
16:17 - 16:34 | DTuner: Efficiently Tuning and Compiling for Dynamic Shape Tensor Programs 讲者:刘硕 (中国科学技术大学) |
16:34 - 16:51 | PHOENIXOS: A Concurrent OS-level GPU Checkpoint and Restore System 讲者:黄卓彬 (上海交通大学) |
16:51 - 17:08 | Hydrogen: Contention-Aware Hybrid Memory for Heterogeneous CPU-GPU Architectures 讲者:李一苇 (清华大学) |
Session 10: Accelerator (Session Chair: 张余豪, 天津大学) |
|
17:08 - 17:25 | WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores 讲者:范广 (蚂蚁技术研究院) |
17:25 - 17:42 | ActiveN: A Scalable and Flexibly-programmable Event-driven Neuromorphic Processor 讲者:刘晓义 (清华大学) |
17:42 - 17:59 | Hassert: Hardware Assertion-Based Agile Verification Framework with FPGA Acceleration 讲者:张子卿 (中科院计算所) |
17:59 - 18:16 | Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning 讲者:翟祎 (中国科学技术大学) |
闭幕 (18:16 - 18:21) |
2024-12-14 (Day 1) | Lightning Talk I |
11:15 - 11:16 | Trinity: A General Purpose FHE Accelerator 讲者:邓翔龙 (中科院信工所) |
11:16 - 11:17 | SQLStateGuard: Statement-Level SQL Injection Defense Based on Learning-Driven Middleware 讲者:王天一 (兰州大学) |
11:17 - 11:18 | Beaver: A High-Performance and Crash-Consistent File System Cache via PM-DRAM Collaborative Memory Tiering 讲者:潘庆霖 (中科院软件所) |
11:18 - 11:19 | A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules 讲者:梁超毅 (复旦大学) |
11:19 - 11:20 | Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm 讲者:林夕 (南京大学) |
11:20 - 11:21 | Mobilizing underutilized storage nodes via job path: A job-aware file striping approach 讲者:鲜港 (中国空气动力研究与发展中心) |
11:21 - 11:22 | EZTopo: An Automatic Custom Topology Generation Framework for Large Scale NoC Design 讲者:唐岩 (国防科技大学) |
11:22 - 11:23 | VertexSurge: Variable Length Graph Pattern Match on Billion-edge Graphs 讲者:谢威宇 (清华大学) |
11:23 - 11:24 | GPU Performance Optimization via Inter-group Cache Cooperation 讲者:王国升 (武汉理工大学) |
11:24 - 11:25 | CROSS: Compiler-Driven Optimization of Sparse DNNs Using Sparse/Dense Computation Kernels 讲者:黄世远 (上海交通大学) |
11:25 - 11:26 | DDP-Fsim: Efficient and Scalable Fault Simulation for Deterministic Patterns with Two-Dimensional Parallelism 讲者:谷丰 (中科院计算所) |
11:26 - 11:27 | Towards Hotness-aware Object Locality Optimization for Memory Tiering 讲者:黄瑞哲 (北京大学) |
11:27 - 11:28 | T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge 讲者:魏剑宇 (微软亚洲研究院) |
11:28 - 11:29 | Fast State Restoration in LLM Serving with HCache 讲者:高世伟 (清华大学) |
11:29 - 11:30 | UNICO: Unifying Coroutines in Rust 讲者:徐启航 (北京大学) |
11:30 - 11:31 | 面向超级计算系统的节点故障异常预测方法 讲者:赵一宁 (中科院计算机网络信息中心) |
11:31 - 11:32 | Scaling Disk Failure Prediction via Multi-Source Stream Mining 讲者:韩淑捷 (西北工业大学) |
11:32 - 11:33 | Sparsification-Driven Accelerator Design for Video Generation Models 讲者:刘军 (上海交通大学) |
11:33 - 11:34 | TB-STC: Transposable Block-wise N:M Structured Sparse Tensor Core 讲者:刘军 (上海交通大学) |
11:34 - 11:35 | Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs 讲者:田冰 (华中科技大学) |
11:35 - 11:36 | Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment 讲者:杨非 (之江实验室) |
11:36 - 11:37 | FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale 讲者:朱泽雨 (中科院自动化研究所) |
11:37 - 11:38 | BulkCC: Scaling Concurrent Data Structures to GPU Parallelism 讲者:芮轲 (中科院计算所) |
11:38 - 11:39 | SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing 讲者:卢澄志 (澳门大学) |
11:39 - 11:40 | HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs 讲者:杨建超 (国防科技大学) |
11:40 - 11:41 | M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type 讲者:胡洧铭 (上海交通大学) |
11:41 - 11:42 | VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference 讲者:刘子汉 (上海交通大学) |
11:42 - 11:43 | COMPASS: SRAM-Based Computing-in-Memory SNN Accelerator with Adaptive Spike Speculation 讲者:汪宗武 (上海交通大学) |
11:43 - 11:44 | Scaling Up Memory Disaggregated Applications With SMART 讲者:任峰 (启元实验室) |
11:44 - 11:45 | Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications 讲者:朱立人 (北京大学) |
11:45 - 11:46 | Flame: A Centralized Cache Controller for Serverless Computing 讲者:杨亚南 (中国电信云计算研究院) |
2024-12-15 (Day 2) | Lightning Talk II |
11:22 - 11:23 | Stream-Based Data Placement for Near-Data Processing with Extended Memory 讲者:李一苇 (清华大学) |
11:23 - 11:24 | MV4PG: Materialized Views for Property Graphs 讲者:徐柴俊 (中国科学技术大学) |
11:24 - 11:25 | Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle Inequalities 讲者:许骞 (中国人民大学) |
11:25 - 11:26 | SparSynergy: Unlocking Flexible and Efficient DNN Acceleration through Multi-Level Sparsity 讲者:杨靖奎 (国防科技大学) |
11:26 - 11:27 | gVulkan: Scalable GPU Pooling for Pixel-Grained Rendering in Ray Tracing 讲者:顾翼成 (上海交通大学) |
11:27 - 11:28 | On-demand and Parallel Checkpoint/Restore for GPU Applications 讲者:杨彦凝 (上海交通大学) |
11:28 - 11:29 | WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training 讲者:林俊峰 (清华大学) |
11:29 - 11:30 | NodeSentry: Unsupervised Anomaly Detection in Production HPC Systems Using Model Sharing 讲者:孙永谦 (南开大学) |
11:30 - 11:31 | Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUs 讲者:杨德闯 (中国石油大学(北京)) |
11:31 - 11:32 | Medusa: Accelerating Serverless LLM Inference with Materialization 讲者:曾少勋 (清华大学) |
11:32 - 11:33 | UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures 讲者:谢童欣 (清华大学) |
11:33 - 11:34 | LegoZK: A Dynamically Reconfigurable Accelerator for Zero-Knowledge Proof 讲者:杨正帮 (中科院信工所) |
11:34 - 11:35 | Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling 讲者:张欣怡 (华东师范大学) |
11:35 - 11:36 | PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory Hardware 讲者:蔡双羽 (清华大学) |
11:36 - 11:37 | WinAcc: Window-based Acceleration of Neural Networks Using Block Floating Point 讲者:鞠鑫 (国防科技大学) |
11:37 - 11:38 | Topology-aware Preemption for Co-located LLM Workloads 讲者:张平 (百川智能) |
11:38 - 11:39 | EDGaE: Efficient Distributed GNN Training System at Edge 讲者:靳松 (武汉大学) |
11:39 - 11:40 | Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation 讲者:王磊 (微软亚洲研究院) |
11:40 - 11:41 | Componentized OS: Flexible and Efficient Architecture for Specialized Operating Systems 讲者:蒋周奇 (华中科技大学) |
11:41 - 11:42 | DRack: A CXL-Disaggregated Rack Architecture to Boost Inter-Rack Communication 讲者:张旭 (中科院计算所) |
11:42 - 11:43 | Improving the Ability of Thermal Radiation Based Hardware Trojan Detection 讲者:苏颋 (国防科技大学) |
11:43 - 11:44 | Constructing a Supplementary Benchmark Suite to Represent Android Applications with User Interactions by Using Performance Counters 讲者:欧阳铖浩 (中科院深圳先进院) |
11:44 - 11:45 | Performance Analysis of Light-weight Memory Isolation Methods 讲者:裴辰举 (吉林大学) |
11:45 - 11:46 | Efficient LLM Inference on GPUs with Operator Optimization and Compilation 讲者:许珈铭 (上海交通大学) |
11:46 - 11:47 | Detecting Broken Object-Level Authorization Vulnerabilities in Database-Backed Applications 讲者:黄永恒 (中科院计算所) |
11:47 - 11:48 | Using Analytical Performance/Power Model and Fine-Grained DVFS to Enhance AI Accelerator Energy Efficiency 讲者:王梓博 (南京大学) |
2024-12-14 (Day 1) | Poster Session |
1 | Trinity: A General Purpose FHE Accelerator |
2 | SQLStateGuard: Statement-Level SQL Injection Defense Based on Learning-Driven Middleware |
3 | Beaver: A High-Performance and Crash-Consistent File System Cache via PM-DRAM Collaborative Memory Tiering |
4 | A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules |
5 | Exploring Hierarchical Patterns for Alert Aggregation in Supercomputers |
6 | Serialization/Deserialization-free State Transfer in Serverless Workflows |
7 | Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm |
8 | Mobilizing underutilized storage nodes via job path: A job-aware file striping approach |
9 | EZTopo: An Automatic Custom Topology Generation Framework for Large Scale NoC Design |
10 | Heterogeneous Collaborative Speculative Decoding: An Acceleration Method for LLM on Personal Devices |
11 | VertexSurge: Variable Length Graph Pattern Match on Billion-edge Graphs |
12 | Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning |
13 | GPU Performance Optimization via Inter-group Cache Cooperation |
14 | CROSS: Compiler-Driven Optimization of Sparse DNNs Using Sparse/Dense Computation Kernels |
15 | OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster |
16 | CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory |
17 | Parrot: Efficient Serving of LLM-based Applications with Semantic Variable |
18 | DDP-Fsim: Efficient and Scalable Fault Simulation for Deterministic Patterns with Two-Dimensional Parallelism |
19 | Towards Hotness-aware Object Locality Optimization for Memory Tiering |
20 | Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators |
21 | Accelerating Transparent Memory Compression in the Cloud |
22 | Derm: SLA-aware Resource Management for Highly Dynamic Microservices |
23 | T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge |
24 | Fast State Restoration in LLM Serving with HCache |
25 | RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator |
26 | UNICO: Unifying Coroutines in Rust |
27 | 面向超级计算系统的节点故障异常预测方法 |
28 | Scaling Disk Failure Prediction via Multi-Source Stream Mining |
29 | Sparsification-Driven Accelerator Design for Video Generation Models |
30 | TB-STC: Transposable Block-wise N:M Structured Sparse Tensor Core |
31 | Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs |
32 | AmgT: Algebraic Multigrid Solver on Tensor Cores |
33 | A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAs |
34 | Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated Memory |
35 | Hydrogen: Contention-Aware Hybrid Memory for Heterogeneous CPU-GPU Architectures |
36 | Gaze into the Pattern: Characterizing Spatial Patterns with Footprint-Internal Correlations for Hardware Prefetching |
37 | WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores |
38 | Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment |
39 | FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale |
40 | BulkCC: Scaling Concurrent Data Structures to GPU Parallelism |
41 | Skyloft: A General High-Efficient Scheduling Framework in User Space |
42 | SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing |
43 | NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering |
44 | HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs |
45 | M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type |
46 | PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared Memory |
47 | VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference |
48 | COMPASS: SRAM-Based Computing-in-Memory SNN Accelerator with Adaptive Spike Speculation |
49 | Scaling Up Memory Disaggregated Applications With SMART |
50 | Mirage: Generating Enormous Databases for Complex Workloads |
51 | Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications |
52 | Stream-Based Data Placement for Near-Data Processing with Extended Memory |
53 | MV4PG: Materialized Views for Property Graphs |
54 | DTuner: Efficiently Tuning and Compiling for Dynamic Shape Tensor Programs |
55 | Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle Inequalities |
56 | PHOENIXOS: A Concurrent OS-level GPU Checkpoint and Restore System |
57 | TrEnv: Transparently Share Serverless Execution Environments Across Different Functions and Nodes |
58 | SparSynergy: Unlocking Flexible and Efficient DNN Acceleration through Multi-Level Sparsity |
59 | gVulkan: Scalable GPU Pooling for Pixel-Grained Rendering in Ray Tracing |
60 | In-Storage Attention Offloading for Efficient Long-Context LLM Inference |
61 | On-demand and Parallel Checkpoint/Restore for GPU Applications |
62 | WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training |
63 | NodeSentry: Unsupervised Anomaly Detection in Production HPC Systems Using Model Sharing |
64 | Understanding the Linux Kernel, Visually |
65 | Constructing Block-Interface All-Flash Array with Zoned-Namespace SSDs |
66 | ActiveN: A Scalable and Flexibly-programmable Event-driven Neuromorphic Processor |
67 | Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUs |
68 | Medusa: Accelerating Serverless LLM Inference with Materialization |
69 | ChameleonEC: Exploiting Tunability of Erasure Coding for Low-Interference Repair |
70 | UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures |
71 | LegoZK: A Dynamically Reconfigurable Accelerator for Zero-Knowledge Proof |
72 | Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling |
73 | PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory Hardware |
74 | WinAcc: Window-based Acceleration of Neural Networks Using Block Floating Point |
75 | BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKS |
76 | Topology-aware Preemption for Co-located LLM Workloads |
77 | EDGaE: Efficient Distributed GNN Training System at Edge |
78 | Hassert: Hardware Assertion-Based Agile Verification Framework with FPGA Acceleration |
79 | Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation |
80 | Efficient 4-bit Matrix Unit via Primitivization |
81 | Componentized OS: Flexible and Efficient Architecture for Specialized Operating Systems |
82 | DRack: A CXL-Disaggregated Rack Architecture to Boost Inter-Rack Communication |
83 | Llumnix: Dynamic Scheduling for Large Language Model Serving |
84 | Flame: A Centralized Cache Controller for Serverless Computing |
85 | Improving the Ability of Thermal Radiation Based Hardware Trojan Detection |
86 | Towards Highly Compatible I/O-aware Workflow Scheduling on HPC Systems |
87 | Constructing a Supplementary Benchmark Suite to Represent Android Applications with User Interactions by Using Performance Counters |
88 | Fast Core Scheduling with Userspace Process Abstraction |
89 | Performance Analysis of Light-weight Memory Isolation Methods |
90 | Efficient LLM Inference on GPUs with Operator Optimization and Compilation |
91 | Detecting Broken Object-Level Authorization Vulnerabilities in Database-Backed Applications |
92 | DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and Swapping |
93 | Using Analytical Performance/Power Model and Fine-Grained DVFS to Enhance AI Accelerator Energy Efficiency |