當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

python读取data_转载 “ 理想国@Data ”重拾Python(5):数据读取博客

發(fā)布時(shí)間：2025/3/21 python 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 python读取data_转载 “ 理想国@Data ”重拾Python(5):数据读取博客小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

本文主要對(duì)Python如何讀取數(shù)據(jù)進(jìn)行總結(jié)梳理，涵蓋從文本文件，尤其是excel文件（用于離線(xiàn)數(shù)據(jù)探索分析），以及結(jié)構(gòu)化數(shù)據(jù)庫(kù)（以Mysql為例）中讀取數(shù)據(jù)等內(nèi)容。

約定：

import numpy as np

import pandas as pd

1、從文本文件中讀取

（1）使用Python標(biāo)準(zhǔn)庫(kù)中的read、readline、readlines方法讀取

a. 一般流程：

step1: 通過(guò)open方法創(chuàng)建一個(gè)文件對(duì)象

setp2: 通過(guò)read、readline、readlines方法讀取文件內(nèi)容

step3: 通過(guò)close方法關(guān)閉文件對(duì)象

b. 區(qū)別：

示例：test.txt

read方法：讀取全部數(shù)據(jù)，結(jié)果為一個(gè)字符串（所有行合并為一個(gè)字符串）

#打開(kāi)文件

f = open('/labcenter/python/pandas/test.txt')

#使用read方法讀取文件

data1 = f.read()

print data1

type(data1)

#關(guān)閉文件

f.close()

結(jié)果：

col1 col2 col3

101 20 0.68

102 30 0.79

103 50 0.72

104 60 0.64

105 70 0.55

str

readline方法：讀取一行數(shù)據(jù)，結(jié)果為一個(gè)字符串，需要seek\next等指針操作方法配合實(shí)現(xiàn)所有記錄的遍歷。

#打開(kāi)文件

f = open('/labcenter/python/pandas/test.txt')

#使用readline方法讀取文件

data2 = f.readline()

print data2

type(data2)

#關(guān)閉文件

f.close()

結(jié)果：

col1 col2 col3

str

readlines方法：讀取全部數(shù)據(jù)，結(jié)構(gòu)為一個(gè)列表（一行為列表中的一個(gè)元素）

#打開(kāi)文件

f = open('/labcenter/python/pandas/test.txt')

#使用readlines方法讀取文件

data3 = f.readlines()

print data3

type(data3)

for line in data3:

print line

#關(guān)閉文件

f.close()

結(jié)果：

['col1 col2 col3\r\n', '101 20 0.68\r\n', '102 30 0.79\r\n', '103 50 0.72\r\n', '104 60 0.64\r\n', '105 70 0.55']

list

col1 col2 col3

101 20 0.68

102 30 0.79

103 50 0.72

104 60 0.64

105 70 0.55

c. 支持文件范圍：

txt\csv\tsv及所有以固定分隔符分隔的文本文件。

（2）使用Numpy庫(kù)中的loadtxt、load、fromfile方法讀取

a. loadtxt方法

從txt文本文件中讀取，返回一個(gè)數(shù)組。

np.loadtxt('/labcenter/python/pandas/test.txt',skiprows=1)

Out[413]:

array([[ 101. , 20. , 0.68],

[ 102. , 30. , 0.79],

[ 103. , 50. , 0.72],

[ 104. , 60. , 0.64],

[ 105. , 70. , 0.55]])

b. load方法

讀取Numpy專(zhuān)用的二進(jìn)制數(shù)據(jù)文件，該文件通?；贜umpy的save或savez方法生成。

write = np.array([[1,2,3,4],[5,6,7,8]])

np.save('output',write)

data = np.load('output.npy')

print data

type(data)

結(jié)果：

[[1 2 3 4]

[5 6 7 8]]

numpy.ndarray

c. fromfile方法

讀取簡(jiǎn)單的文本文件和二進(jìn)制文件，該文件通?；贜umpy的tofile方法生成。

write = np.array([[1,2,3,4],[5,6,7,8]])

write.tofile('output')

data = np.fromfile('output',dtype='float32')

print data

type(data)

結(jié)果：

[ 1.40129846e-45 0.00000000e+00 2.80259693e-45 ..., 0.00000000e+00

1.12103877e-44 0.00000000e+00]

numpy.ndarray

（3）使用Pandas庫(kù)中的read_csv、read_table、read_excel等方法讀取

a. read_csv方法

讀取csv文件，返回一個(gè)DataFrame對(duì)象或TextParser對(duì)象。

示例：

test.csv

data = pd.read_csv('/labcenter/python/pandas/test.csv')

print data

type(data)

結(jié)果：

col1 col2 col3

0 101 20 0.68

1 102 30 0.79

2 103 50 0.72

3 104 60 0.64

4 105 70 0.55

pandas.core.frame.DataFrame

b. read_table方法

讀取通用分隔符分隔的文本文件，返回一個(gè)DataFrame對(duì)象或TextParser對(duì)象。

data = pd.read_table('/labcenter/python/pandas/test.csv',sep=',')

print data

type(data)

結(jié)果：

col1 col2 col3

0 101 20 0.68

1 102 30 0.79

2 103 50 0.72

3 104 60 0.64

4 105 70 0.55

pandas.core.frame.DataFrame

c. read_excel方法

讀取excel文件，返回一個(gè)DataFrame對(duì)象或TextParser對(duì)象。

示例：

test.xlsx

data = pd.read_excel('/labcenter/python/pandas/test.xlsx')

print data

type(data)

結(jié)果：

col1 col2 col3

0 101 21 22.6

1 102 31 31.2

2 103 41 32.7

3 104 51 28.2

4 105 61 18.9

pandas.core.frame.DataFrame

d. 其他方法

read_sql方法：讀取sql請(qǐng)求或者數(shù)據(jù)庫(kù)中的表。

read_json方法：讀取json文件。

（4）如何選擇？

a. 選取自己最熟悉的方法。

b. 根據(jù)場(chǎng)景選擇：

① 對(duì)純文本、非結(jié)構(gòu)化的數(shù)據(jù)：標(biāo)準(zhǔn)庫(kù)的三種方法

② 對(duì)結(jié)構(gòu)化、數(shù)值型，并且要用于矩陣計(jì)算、數(shù)據(jù)建模的：Numpy的loadtxt方法

③ 對(duì)于二進(jìn)制數(shù)據(jù)：Numpy的load和fromfile方法

④ 對(duì)于結(jié)構(gòu)化的數(shù)據(jù)，并且要用于數(shù)據(jù)探索分析的：Pandas方法

2、從Excel文件中讀取

excel往往是在進(jìn)行離線(xiàn)數(shù)據(jù)探索分析時(shí)提供的數(shù)據(jù)文件格式，因此這里單獨(dú)拿出來(lái)多總結(jié)一下。

（1）使用Pandas庫(kù)的read_excel方法

見(jiàn)上文1.3.c內(nèi)容。

（2）使用其他第三方庫(kù)

以xlrd庫(kù)為例， xlrd模塊實(shí)現(xiàn)對(duì)excel文件內(nèi)容讀取。

import xlrd

#打開(kāi)一個(gè)excel文件

xlsx=xlrd.open_workbook('/labcenter/python/pandas/test.xlsx')

#讀取sheet清單

sheets=xlsx.sheet_names()

sheets

#獲取一個(gè)sheet數(shù)據(jù)

sheet1=xlsx.sheets()[0]

#獲取指定sheet的名稱(chēng)

sheet1.name

#獲取指定sheet的行數(shù)

sheet1.nrows

#獲取指定sheet的列數(shù)

sheet1.ncols

#獲取指定sheet某行的數(shù)據(jù)

sheet1.row_values(1)

#獲取指定sheet某列的數(shù)據(jù)

sheet1.col_values(1)

#獲取指定sheet某單元格的數(shù)據(jù)

sheet1.row(1)[2].value

sheet1.cell_value(1,2)

#逐行獲取指定sheet的數(shù)據(jù)

for i in range(sheet1.nrows):

print sheet1.row_values(i)

結(jié)果：

[u'Sheet1', u'Sheet2']

u'Sheet1'

[101.0, 21.0, 22.6]

[u'col2', 21.0, 31.0, 41.0, 51.0, 61.0]

22.6

[u'col1', u'col2', u'col3']

[101.0, 21.0, 22.6]

[102.0, 31.0, 31.2]

[103.0, 41.0, 32.7]

[104.0, 51.0, 28.2]

[105.0, 61.0, 18.9]

3、從結(jié)構(gòu)化數(shù)據(jù)庫(kù)中讀取

根據(jù)數(shù)據(jù)庫(kù)選擇相應(yīng)的庫(kù)，如：mysql數(shù)據(jù)庫(kù)使用MySQLdb庫(kù)，oracle數(shù)據(jù)庫(kù)使用cx_Oracle庫(kù)，teradata數(shù)據(jù)庫(kù)使用teradata庫(kù)，等等。

一般流程：

step1: 建立數(shù)據(jù)庫(kù)連接

step2: cursor方法獲取游標(biāo)

step3: execute方法執(zhí)行SQL語(yǔ)句

step4: fetchall方法獲取返回的記錄

step5: close方法關(guān)閉游標(biāo)

step6: close方法斷開(kāi)數(shù)據(jù)庫(kù)連接

示例：

import MySQLdb

#建立數(shù)據(jù)庫(kù)連接

conn = MySQLdb.connect("localhost", "root", "root", "testdb", charset='utf8')

#獲取游標(biāo)

cursor = conn.cursor()

#執(zhí)行SQL語(yǔ)句

cursor.execute("select * from mytab1;")

#獲取返回的記錄

results = cursor.fetchall()

#逐行打印

for result in results:

print result

#關(guān)閉游標(biāo)

cursor.close()

#斷開(kāi)數(shù)據(jù)庫(kù)連接

conn.close()

結(jié)果：

(1L, u'aaa')

(2L, u'bbb')

(3L, u'ccc')

(4L, u'ddd')

(5L, u'eee')

可通過(guò)命令pip install MySql-Python安裝庫(kù)MySQLdb。

4.參考與感謝

總結(jié)

以上是生活随笔為你收集整理的python读取data_转载 “ 理想国@Data ”重拾Python(5):数据读取博客的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： python数据分析pandas_Pyt
下一篇： python打包工具报错_Python打

python

python读取data_转载 “ 理想国@Data ”重拾Python(5):数据读取 博客

總結(jié)

python读取data_转载 “ 理想国@Data ”重拾Python(5):数据读取博客