分类饼图数据处理\#
对实验得到的结果微塑料分类进行计数和可视化
原始数据
#,Id,Width (mirco_m),Height (mirco_m),Diameter (micro_m),Aspect Ratio,Area (mirco_m),Perimeter (mirco_m),Eccentricity,Circularity,Solidity,Identification,Notes,Match Type,Quality,Is Valid
1,A1,651,667,265.8883708,0.975582044,55525,3265.634892,0.684717284,0.065427929,0.169529044,Polypropylene (PP),,Auto,0.668195562,true
2,A2,157,135,144.2825927,1.16450215,16350,508.7005723,0.78690049,0.79396831,0.951965066,Chitin,,Auto,0.708585815,true
3,A3,79,296,143.1196743,0.266592912,16087.5,699.6193999,0.966117793,0.413023475,0.906338028,Cellulosic,,Auto,0.800200059,true
4,A4,159,98,121.7264359,1.613636458,11637.5,425.060963,0.738967585,0.809407295,0.975890985,Chitin,,Auto,0.771035871,true
5,A5,100,168,119.5496294,0.594059372,11225,446.2741661,0.836393013,0.708260007,0.943277311,Polyethylene (PE),,Auto,0.765315477,true
我们只要提取出特征Identification
即可
import pandas as pd
# 读取CSV文件
# 路径/Users/kiharari/Desktop/仪器分析实验/microplastics.csv
df = pd.read_csv('/Users/kiharari/Desktop/仪器分析实验/microplastics.csv')
# 提取Identification列并统计每种微塑料的数量
identification_counts = df['Identification'].value_counts()
# 打印统计结果
print(identification_counts)
from matplotlib import pyplot as plt
plt.rc('font', family = 'Songti SC')
explode = [0.1] * len(identification_counts) # 可以调整以突出显示特定部分
plt.figure(figsize=(10, 8)) # 设置图形大小
plt.pie(identification_counts, labels=identification_counts.index, autopct='%1.1f%%', startangle=140, explode=explode, shadow=True)
plt.axis('equal') # 确保饼图是圆形的
plt.title('微塑料种类分布')
plt.show()
运行上面的代码可以得到:
发现占比过小的部分标签和数值都挤在一起了,看不清楚
此时有两种优化方案:
额外增加一个legend
# 为每个部分设置“爆炸”距离
explode = [0.1] * len(identification_counts) # 可以调整以突出显示特定部分
# 绘制饼图
plt.figure(figsize=(10, 8))
wedges, texts, autotexts = plt.pie(identification_counts, labels=identification_counts.index, autopct='%1.1f%%', startangle=140, explode=explode, shadow=True, pctdistance=0.85)
# 调整文本和线条
plt.setp(texts, size=8)
plt.setp(autotexts, size=8, weight="bold")
plt.axis('equal')
# 调整图例
plt.legend(loc="center left", bbox_to_anchor=(0.8, -0.1, 0.5, 1))
# 添加标题
plt.title('微塑料种类分布')
# 显示图表
plt.show()
合并标签:
# 合并小部分到“其他”类别
small_categories_threshold = 0.05 # 设定阈值,例如5%
small_categories = identification_counts[identification_counts / identification_counts.sum() < small_categories_threshold]
other = small_categories.sum()
identification_counts = identification_counts[identification_counts / identification_counts.sum() >= small_categories_threshold]
identification_counts['其他'] = other
# 为每个部分设置“爆炸”距离
explode = [0.1 if identification_counts[i] == other else 0 for i in range(len(identification_counts))]
# 绘制饼图
plt.figure(figsize=(10, 8))
wedges, texts, autotexts = plt.pie(identification_counts, labels=identification_counts.index, autopct='%1.1f%%', startangle=140, explode=explode, shadow=True)
# 调整文本和线条
plt.setp(texts, size=8)
plt.setp(autotexts, size=8, weight="bold")
plt.axis('equal')
# 添加标题
plt.title('微塑料种类分布')
# 显示图表
plt.show()
这样就可以生成一个较为简单的图
但是缺点是缺少信息
可以通过主图和次图来标明信息
# 合并小部分到“其他”类别
small_categories_threshold = 0.05 # 设定阈值,例如5%
small_categories = identification_counts[identification_counts / identification_counts.sum() < small_categories_threshold]
other = small_categories.sum()
main_categories = identification_counts[identification_counts / identification_counts.sum() >= small_categories_threshold]
main_categories['其他'] = other
# 为主饼图的每个部分设置“爆炸”距离
explode_main = [0.1 if category == '其他' else 0 for category in main_categories.index]
# 绘制主饼图
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
ax1.pie(main_categories, labels=main_categories.index, autopct='%1.1f%%', startangle=30, explode=explode_main, shadow=True)
ax1.set_title('微塑料种类分布 - 主图')
# 为次饼图的每个部分设置“爆炸”距离
explode_other = [0.1] * len(small_categories)
# 绘制次饼图
ax2.pie(small_categories, labels=small_categories.index, autopct='%1.1f%%', startangle=140, explode=explode_other, shadow=True)
ax2.set_title('微塑料种类分布 - “其他”')
# 在两个饼图之间添加箭头
fig.tight_layout()
fig.subplots_adjust(wspace=0.3) # 调整子图间距
ax1.annotate('放大', xy=(1.2, 0.5), xytext=(0.8, 0.5),
arrowprops=dict(facecolor='black', shrink=0.05),
xycoords='axes fraction', textcoords='axes fraction')
# 显示图表
plt.show()
如何将副图缩小到合适的尺寸呢?
加上下面的代码: