當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

[转载] 字符串操作截取后面的字符串_对字符串的5个必知的熊猫操作

發(fā)布時(shí)間：2025/3/11 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 [转载] 字符串操作截取后面的字符串_对字符串的5个必知的熊猫操作小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

參考鏈接：修剪Java中的字符串(刪除前導(dǎo)和尾隨空格)

字符串操作截取后面的字符串

? ?

? ??

? ? ?We have to represent every bit of data in numerical values to be processed and analyzed by machine learning and deep learning models. However, strings do not usually come in a nice and clean format and require preprocessing to convert to numerical values. Pandas offers many versatile functions to modify and process string data efficiently.

? ? ? 我們必須以數(shù)值表示數(shù)據(jù)的每一位，以便通過(guò)機(jī)器學(xué)習(xí)和深度學(xué)習(xí)模型進(jìn)行處理和分析。但是，字符串通常不會(huì)采用簡(jiǎn)潔的格式，需要進(jìn)行預(yù)處理才能轉(zhuǎn)換為數(shù)值。熊貓?zhí)峁┝嗽S多通用功能，可以有效地修改和處理字符串?dāng)?shù)據(jù)。??

? ? ?In this post, we will discover how Pandas can manipulate strings. I grouped string functions and methods under 5 categories:

? ? ? 在本文中，我們將發(fā)現(xiàn)Pandas如何操縱字符串。我將字符串函數(shù)和方法分為5類：??

? ? ?Splitting 分裂 Stripping 剝離 Replacing 更換 Filtering 篩選 Combining 結(jié)合?

? ? ?Let’s first create a sample dataframe to work on for examples.

? ? ? 讓我們首先創(chuàng)建一個(gè)示例數(shù)據(jù)框以進(jìn)行示例。??

? ? ?import numpy as npimport pandas as pdsample = {'col_a':['Houston,TX', 'Dallas,TX', 'Chicago,IL', 'Phoenix,AZ',? ? ? 'San Diego,CA'],'col_b':['$64K-$72K', '$62K-$70K', '$69K-$76K', '$62K-$72K', '$71K-$78K' ],'col_c':['A','B','A','a','c'],'col_d':['? 1x', ' 1y', '2x? ', '1x', '1y? ']}df_sample = pd.DataFrame(sample)df_sample

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ? 1.分裂 (1. Splitting)?

? ? ?Sometimes strings carry more than one piece of information and we may need to use them separately. For instance, “col_a” contains both city and state. The split function of pandas is a highly flexible function to split strings.

? ? ? 有時(shí)字符串包含不止一條信息，我們可能需要單獨(dú)使用它們。例如，“ col_a”包含城市和州。 pandas的split函數(shù)是用于拆分字符串的高度靈活的函數(shù)。??

? ? ?df_sample['col_a'].str.split(',')0? ? ? [Houston, TX] 1? ? ? ?[Dallas, TX] 2? ? ? [Chicago, IL] 3? ? ? [Phoenix, AZ] 4? ? [San Diego, CA] Name: col_a, dtype: object

? ? ?Now each element is converted to a list based on the character used for splitting. We can easily export individual elements from those lists. Let’s create a “state” column.

? ? ? 現(xiàn)在，每個(gè)元素都會(huì)根據(jù)用于拆分的字符轉(zhuǎn)換為列表。我們可以輕松地從這些列表中導(dǎo)出單個(gè)元素。讓我們創(chuàng)建一個(gè)“狀態(tài)”列。??

? ? ?df_sample['state'] = df_sample['col_a'].str.split(',').str[1]df_sample

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?Warning: Subscript ([1]) must be applied with str keyword. Otherwise, we will get the list in the specified row.

? ? ? 警告：下標(biāo)([1])必須與str關(guān)鍵字一起應(yīng)用。否則，我們將在指定的行中獲取列表。??

? ? ?df_sample['col_a'].str.split(',')[1]['Dallas', 'TX']

? ? ?The splitting can be done on any character or letter.

? ? ? 可以對(duì)任何字符或字母進(jìn)行拆分。??

? ? ?The split function returns a dataframe if expand parameter is set as True.

? ? ? 如果將expand參數(shù)設(shè)置為True，則split函數(shù)將返回一個(gè)數(shù)據(jù)幀。??

? ? ?df_sample['col_a'].str.split('a', expand=True)

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ? 拆分vs rsplit (split vs rsplit)?

? ? ?By default, splitting is done from the left. To do splitting on the right, use rsplit.

? ? ? 默認(rèn)情況下，拆分是從左側(cè)開始的。要在右側(cè)進(jìn)行拆分，請(qǐng)使用rsplit 。??

? ? ?Consider the series below:

? ? ? 考慮以下系列：??

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?Let’s apply split function and limit the number of splits with n parameter:

? ? ? 讓我們應(yīng)用split函數(shù)并使用n參數(shù)限制拆分次數(shù)：??

? ? ?categories.str.split('-', expand=True, n=2)

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?Only 2 splits on the left are performed. If we do the same operation with rsplit:

? ? ? 左側(cè)僅執(zhí)行2個(gè)拆分。如果我們對(duì)rsplit執(zhí)行相同的操作：??

? ? ?categories.str.rsplit('-', expand=True, n=2)

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?Same operation is done but on the right.

? ? ? 完成相同的操作，但在右側(cè)。??

? ??

? ?

? ??

? ? ? 2.剝離 (2. Stripping)?

? ? ?Stripping is like trimming tree branches. We can remove spaces or any other characters at the beginning or end of a string.

? ? ? 剝離就像修剪樹枝。我們可以刪除字符串開頭或結(jié)尾的空格或任何其他字符。??

? ? ?For instance, the strings in “col_b” has $ character at the beginning which can be removed with lstrip:

? ? ? 例如，“ col_b”中的字符串開頭有$字符，可以使用lstrip將其刪除：??

? ? ?df_sample['col_b'].str.lstrip('$')0? ? 64K-$72K 1? ? 62K-$70K 2? ? 69K-$76K 3? ? 62K-$72K 4? ? 71K-$78K Name: col_b, dtype: object

? ? ?Similary, rstrip is used to trim off characters from the end.

? ? ? 類似地， rstrip用于從末尾修剪字符。??

? ? ?Strings may have spaces at the beginning or end. Consider “col_d” in our dataframe.

? ? ? 字符串的開頭或結(jié)尾可以有空格。考慮一下我們數(shù)據(jù)框中的“ col_d”。??

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?Those leading and trailing spaces can be removed with strip:

? ? ? 那些前導(dǎo)和尾隨空格可以用strip除去：??

? ? ?df_sample['col_d'] = df_sample['col_d'].str.strip()

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ??

? ?

? ??

? ? ? 3.更換 (3. Replacing)?

? ? ?Pandas replace function is used to replace values in rows or columns. Similarly, replace as a string operation is used to replace characters in a string.

? ? ? 熊貓?zhí)鎿Q功能用于替換行或列中的值。同樣，替換為字符串操作用于替換字符串中的字符。??

? ? ?Let’s replace “x” letters in “col_d” with “z”.

? ? ? 讓我們用“ z”替換“ col_d”中的“ x”個(gè)字母。??

? ? ?df_sample['col_d'] = df_sample['col_d'].str.replace('x', 'z')

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ??

? ?

? ??

? ? ? 4.篩選 (4. Filtering)?

? ? ?We can filter strings based on the first and last characters. The functions to use are startswith() and endswith().

? ? ? 我們可以根據(jù)第一個(gè)和最后一個(gè)字符來(lái)過(guò)濾字符串。要使用的函數(shù)是startswith()和endswith() 。??

? ? ?Here is our original dataframe:

? ? ? 這是我們的原始數(shù)據(jù)框：??

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?Here is a filtered version that only includes rows in which “col_a” ends with the letter “x”.

? ? ? 這是一個(gè)過(guò)濾的版本，僅包含“ col_a”以字母“ x”結(jié)尾的行。??

? ? ?df_sample[df_sample['col_a'].str.endswith('X')]

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?Or, rows in which “col_b” starts with “$6”:

? ? ? 或者，其中“ col_b”以“ $ 6”開頭的行：??

? ? ?df_sample[df_sample['col_b'].str.startswith('$6')]

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?We can also filter strings by extracting certain characters. For instace, we can get the first 2 character of strings in a column or series by str[:2].

? ? ? 我們還可以通過(guò)提取某些字符來(lái)過(guò)濾字符串。對(duì)于instace，我們可以通過(guò)str [：2]獲得列或系列中字符串的前2個(gè)字符。??

? ? ?“col_b” represents a value range but numerical values are hidden in a string. Let’s extract them with string subscripts:

? ? ? “ col_b”表示值范圍，但數(shù)值隱藏在字符串中。讓我們用字符串下標(biāo)提取它們：??

? ? ?lower? = df_sample['col_b'].str[1:3]

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ? ?upper? = df_sample['col_b'].str[-3:-1]

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ??

? ?

? ??

? ? ? 5.結(jié)合 (5. Combining)?

? ? ?Cat function can be used to concatenate strings.

? ? ? Cat函數(shù)可用于連接字符串。??

? ? ?We need pass an argument to put between concatenated strings using sep parameter. By default, cat ignores missing values but we can also specify how to handle them using na_rep parameter.

? ? ? 我們需要傳遞一個(gè)參數(shù)，以使用sep參數(shù)在串聯(lián)字符串之間放置。默認(rèn)情況下，cat會(huì)忽略缺失值，但我們也可以使用na_rep參數(shù)指定如何處理它們。??

? ? ?Let’s create a new column by concatenating “col_c” and “col_d” with “-” separator.

? ? ? 讓我們通過(guò)將“ col_c”和“ col_d”與“-”分隔符連接起來(lái)創(chuàng)建一個(gè)新列。??

? ? ?df_sample['new']=df_sample['col_c'].str.cat(df_sample['col_d'], sep='-')df_sample

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ??

? ?

? ??

? ? ? 獎(jiǎng)勵(lì)：對(duì)象與字符串 (Bonus: Object vs String)?

? ? ?Before pandas 1.0, only “object” datatype was used to store strings which cause some drawbacks because non-string data can also be stored using “object” datatype. Pandas 1.0 introduces a new datatype specific to string data which is StringDtype. As of now, we can still use object or StringDtype to store strings but in the future, we may be required to only use StringDtype.

? ? ? 在pandas 1.0之前，僅使用“對(duì)象”數(shù)據(jù)類型存儲(chǔ)字符串，這會(huì)帶來(lái)一些缺點(diǎn)，因?yàn)榉亲址當(dāng)?shù)據(jù)也可以使用“對(duì)象”數(shù)據(jù)類型進(jìn)行存儲(chǔ)。 Pandas 1.0引入了特定于字符串?dāng)?shù)據(jù)的新數(shù)據(jù)類型StringDtype 。到目前為止，我們?nèi)匀豢梢允褂胦bject或StringDtype來(lái)存儲(chǔ)字符串，但是在將來(lái)，可能需要我們僅使用StringDtype。??

? ? ?

? ? ? One important thing to note here is that object datatype is still the default datatype for strings. To use StringDtype, we need to explicitly state it.

? ? ? ?這里要注意的一件事是對(duì)象數(shù)據(jù)類型仍然是字符串的默認(rèn)數(shù)據(jù)類型。要使用StringDtype，我們需要明確聲明它。?

? ? ?

? ? ?We can pass “string” or pd.StringDtype() argument to dtype parameter to string datatype.

? ? ? 我們可以將“ string ”或pd.StringDtype()參數(shù)傳遞給dtype參數(shù)，以傳遞給字符串?dāng)?shù)據(jù)類型。??

? ? ?

? ? ??

? ? ? ?

? ? ? ??

? ? ? ? ?

? ? ? ? ??

? ? ? ? ?

? ? ? ??

? ? ? ?

? ? ??

? ? ?

? ??

? ?

? ??

? ? ?Thank you for reading. Please let me know if you have any feedback.

? ? ? 感謝您的閱讀。如果您有任何反饋意見，請(qǐng)告訴我。?

? ??

? ?

? 翻譯自: https://towardsdatascience.com/5-must-know-pandas-operations-on-strings-4f88ca6b8e25

?字符串操作截取后面的字符串

總結(jié)

以上是生活随笔為你收集整理的[转载] 字符串操作截取后面的字符串_对字符串的5个必知的熊猫操作的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： python条件判断true_Pytho
下一篇： react json转换_Typescr