跳转至

分类饼图数据处理\#

对实验得到的结果微塑料分类进行计数和可视化

原始数据

#,Id,Width (mirco_m),Height (mirco_m),Diameter (micro_m),Aspect Ratio,Area (mirco_m),Perimeter (mirco_m),Eccentricity,Circularity,Solidity,Identification,Notes,Match Type,Quality,Is Valid
1,A1,651,667,265.8883708,0.975582044,55525,3265.634892,0.684717284,0.065427929,0.169529044,Polypropylene (PP),,Auto,0.668195562,true
2,A2,157,135,144.2825927,1.16450215,16350,508.7005723,0.78690049,0.79396831,0.951965066,Chitin,,Auto,0.708585815,true
3,A3,79,296,143.1196743,0.266592912,16087.5,699.6193999,0.966117793,0.413023475,0.906338028,Cellulosic,,Auto,0.800200059,true
4,A4,159,98,121.7264359,1.613636458,11637.5,425.060963,0.738967585,0.809407295,0.975890985,Chitin,,Auto,0.771035871,true
5,A5,100,168,119.5496294,0.594059372,11225,446.2741661,0.836393013,0.708260007,0.943277311,Polyethylene (PE),,Auto,0.765315477,true

image-20231229030516796

我们只要提取出特征Identification即可

import pandas as pd

# 读取CSV文件
# 路径/Users/kiharari/Desktop/仪器分析实验/microplastics.csv
df = pd.read_csv('/Users/kiharari/Desktop/仪器分析实验/microplastics.csv')  

# 提取Identification列并统计每种微塑料的数量
identification_counts = df['Identification'].value_counts()

# 打印统计结果
print(identification_counts)

from matplotlib import pyplot as plt
plt.rc('font', family = 'Songti SC')

explode = [0.1] * len(identification_counts)  # 可以调整以突出显示特定部分
plt.figure(figsize=(10, 8))  # 设置图形大小
plt.pie(identification_counts, labels=identification_counts.index, autopct='%1.1f%%', startangle=140, explode=explode, shadow=True)
plt.axis('equal')  # 确保饼图是圆形的
plt.title('微塑料种类分布')
plt.show()

运行上面的代码可以得到:

微塑料详细种类分布

发现占比过小的部分标签和数值都挤在一起了,看不清楚

此时有两种优化方案:

额外增加一个legend

# 为每个部分设置“爆炸”距离
explode = [0.1] * len(identification_counts)  # 可以调整以突出显示特定部分

# 绘制饼图
plt.figure(figsize=(10, 8))
wedges, texts, autotexts = plt.pie(identification_counts, labels=identification_counts.index, autopct='%1.1f%%', startangle=140, explode=explode, shadow=True, pctdistance=0.85)

# 调整文本和线条
plt.setp(texts, size=8)
plt.setp(autotexts, size=8, weight="bold")
plt.axis('equal')

# 调整图例
plt.legend(loc="center left", bbox_to_anchor=(0.8, -0.1, 0.5, 1))

# 添加标题
plt.title('微塑料种类分布')

# 显示图表
plt.show()

微塑料详细种类分布(优化)

合并标签:

# 合并小部分到“其他”类别
small_categories_threshold = 0.05  # 设定阈值,例如5%
small_categories = identification_counts[identification_counts / identification_counts.sum() < small_categories_threshold]
other = small_categories.sum()
identification_counts = identification_counts[identification_counts / identification_counts.sum() >= small_categories_threshold]
identification_counts['其他'] = other

# 为每个部分设置“爆炸”距离
explode = [0.1 if identification_counts[i] == other else 0 for i in range(len(identification_counts))]

# 绘制饼图
plt.figure(figsize=(10, 8))
wedges, texts, autotexts = plt.pie(identification_counts, labels=identification_counts.index, autopct='%1.1f%%', startangle=140, explode=explode, shadow=True)

# 调整文本和线条
plt.setp(texts, size=8)
plt.setp(autotexts, size=8, weight="bold")
plt.axis('equal')

# 添加标题
plt.title('微塑料种类分布')

# 显示图表
plt.show()

这样就可以生成一个较为简单的图

微塑料大致种类分布

但是缺点是缺少信息

可以通过主图和次图来标明信息

# 合并小部分到“其他”类别
small_categories_threshold = 0.05  # 设定阈值,例如5%
small_categories = identification_counts[identification_counts / identification_counts.sum() < small_categories_threshold]
other = small_categories.sum()
main_categories = identification_counts[identification_counts / identification_counts.sum() >= small_categories_threshold]
main_categories['其他'] = other

# 为主饼图的每个部分设置“爆炸”距离
explode_main = [0.1 if category == '其他' else 0 for category in main_categories.index]

# 绘制主饼图
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
ax1.pie(main_categories, labels=main_categories.index, autopct='%1.1f%%', startangle=30, explode=explode_main, shadow=True)
ax1.set_title('微塑料种类分布 - 主图')

# 为次饼图的每个部分设置“爆炸”距离
explode_other = [0.1] * len(small_categories)

# 绘制次饼图
ax2.pie(small_categories, labels=small_categories.index, autopct='%1.1f%%', startangle=140, explode=explode_other, shadow=True)
ax2.set_title('微塑料种类分布 - “其他”')

# 在两个饼图之间添加箭头
fig.tight_layout()
fig.subplots_adjust(wspace=0.3)  # 调整子图间距
ax1.annotate('放大', xy=(1.2, 0.5), xytext=(0.8, 0.5),
             arrowprops=dict(facecolor='black', shrink=0.05), 
             xycoords='axes fraction', textcoords='axes fraction')

# 显示图表
plt.show()

微塑料种类分布(主次)

如何将副图缩小到合适的尺寸呢?

加上下面的代码:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6), gridspec_kw={'width_ratios': [3, 1]})  # 主饼图比副饼图大3倍