Combine Pandas Datasets: Concatenation and Merging Operations

combining pandas datasets n.w
1 / 7
Embed
Share

Learn how to efficiently combine data from multiple sources using Pandas through concatenation and merging operations. Professor John Carelli from Kutztown University's Computer Science Department explains the process, options, and types of joins involved in merging relational data.

  • Pandas
  • Data Analysis
  • Concatenation
  • Merging
  • Data Science

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Combining Pandas Datasets It is often necessary when doing data analysis to combine information from multiple sources Pandas includes methods and functions for combining both Series and DataFrame objects Two common operations: concatenation merging / joining Professor John Carelli Kutztown University Computer Science Department

  2. Concatenation concat function can be used on either Series or DataFrame objects accepts a list of objects and combines them into one extended object Professor John Carelli Kutztown University Computer Science Department

  3. concat arguments Options for controlling how the data are combined axis : select the axis to concatenate along ignore_index : if True, do not use the index values along the concatenation axis verify_integrity : raise an exception if there are duplicate indices keys : construct a hierarchical index using the keys values join : specify an inner or outer join See PandasCombiningDatasets.ipynb for examples Professor John Carelli Kutztown University Computer Science Department

  4. Merging Merging performs operations on relational data Relational data Collections of data with pre-defined relationships Typically stored in sets tables of tables (as in a database) Professor John Carelli Kutztown University Computer Science Department

  5. Merging operations Merging provides an interface to perform joining operations on objects containing relational data Types of joins: one-to-one recognizes one column as a key and joins based on that many-to-one one key column has duplicate entries, duplicates are preserved many-to-many duplicate key entries in both tables, duplicates again preserved Professor John Carelli Kutztown University Computer Science Department

  6. Options to merge Options for controlling how the data are combined on: Which column should be the key. left_on and right_on: Specify a key column with different names in each table left_index and right_index: Merge on index instead of column how: Specify inner, outer, left, or right join suffixes: Append a suffix to any conflicting column names. See PandasCombiningDatasets.ipynb for examples Professor John Carelli Kutztown University Computer Science Department

  7. pivot and pivot_table Both create a new DataFrame from selected columns of an existing DataFrame pivot_table is more general Allows for a list on the index (MultiIndex) Aggregates duplicate entries See PandasCombiningDatasets.ipynb for examples Professor John Carelli Kutztown University Computer Science Department

Related


More Related Content