
Optimizing Data Storage for Application Performance Monitoring
Learn about choosing the right key for your data storage using Cassandra for application performance monitoring. Explore metrics, solutions, and the benefits of wide column stores like Apache Cassandra. Discover how to store and analyze log messages efficiently to enhance productivity and track performance metrics.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Choosing the right key for your data (Using Cassandra for application performance monitoring) BigData Club Rudolf Rakos, Akos Nagy-Zambo 2012 September 12th
Agenda Introduction Metrics Possible solutions Our choice 2
prototype template (5428278)\print library_new_final.ppt 6/5/2025 Introduction Our main goal is to increase our productivity when supporting applications. Our team is responsible for several applications running on multiple hosts and executing critical operations on financial data. In case of issues we need to trace messages across various log files on different hosts. Analyzing our logs could help us finding patterns and outliers. We would like to be able to find or predict performance bottlenecks in the system. Alerts (proactive) Charts (reactive) For this reason we aim to store our log messages in a centralized location. Having it indexed will also help us to retrieve exactly what we are looking for. 3
prototype template (5428278)\print library_new_final.ppt 6/5/2025 Metrics For performance monitoring we need measurements. If we are able to measure and track certain attributes in our system it can be used to improve the performance over time. What makes metrics different? Aggregatable (average, min, max) Correlatable Immutable Example metrics are: Message size Processing duration Calculations Database operations IO (disk, network) 4
prototype template (5428278)\print library_new_final.ppt 6/5/2025 Possible solutions There are various ways to store time series but here I compare 2 common solutions. RDBMS Wide column stores Pros Pros Wide range of query options Distributed Easy integration with existing systems (relations, joins) Linear scalability Flexible consistency models (CAP theorem) ACID transactions Efficient IO Known performance tuning practices Compression Sorting Sequential reads Good random IO performance Cons Cons Limited querying (by key, by key range) Limited storage Limited transactions (only atomicity) Limited distribution to nodes Performance tuning at schema design time 5
prototype template (5428278)\print library_new_final.ppt 6/5/2025 Possible solutions We believe columns stores would serve us better in this case. Well known wide column stores Amazon DynamoDB Amazon SimpleDB Apache Cassandra Apache Hbase (Hadoop) Google BigTable Think of it like: Map<RowKey, SortedMap<ColumnKey, ColumnValue>> 6
prototype template (5428278)\print library_new_final.ppt 6/5/2025 Our choice Physical organization of the data Rows Keys Collection of (super) columns Determines the physical location of the data (which node) Partitioned by key (token ring) Meaningful data (not just a name) Not sorted by key (usually) Super columns Columns Collection of columns Key/value pair Similar to columns with composite keys Sorted by key Sorted by key 7
prototype template (5428278)\print library_new_final.ppt 6/5/2025 Our choice Choosing the right key Use cases Things to consider Chart metrics by date, tags, aggregation Ease of use Chart multiple metrics by date, tags, aggregation Efficiency Chart multiple metrics aggregated on the fly Operations Aggregate metrics by minutes or hours Insertion Aggregate metrics older than a month by minutes or hours Querying By date (range) Delete metrics older than a year By date, tags, aggregation Delete metrics aggregated by seconds older than a month By date, tags, aggregation, time range Deletion By date (range) By date, tags, aggregation 8
prototype template (5428278)\print library_new_final.ppt 6/5/2025 Our choice Schema ideas Tags, date, metric type, aggregation Row key: Host, Service, Metric Name, Date Super column key: Metric Type, Aggregate Function Column key: Time Stamp Tags, date, aggregation Row key: Host, Service, Metric Name, Metric Type, Date Super column key: Aggregation Interval Column key: Aggregate Function, Time Stamp Date, tags, aggregation Row key: Date, Service, Host, Metric Name, Metric Type Super column key: Aggregation Interval Column key: Aggregate Function, Time Stamp Date, tags, aggregation, no super column Row key: Date, Service, Host, Metric Name, Metric Type Column key: Aggregation Interval, Aggregate Function, Time Stamp Secondary indexes (to support querying) 9