Review the past and learn the new (1) In-depth understanding of strings in Java
Sep 18, 2020 pm 05:32 PMRelated learning recommendations: java basic tutorial
beginner In Java, we already know that Java can be divided into two major data types, namely basic data types and reference data types. Among these two data types, there is a special data type, String. String is a reference data type, but it is different from other reference data types. It can be said that it is a strange flower among data types. So, in this article, we will have an in-depth understanding of String strings in Java.
1. Let’s start with the memory allocation of String strings
The previous article "Reviewing the Past and Learning the New--JVM Memory Allocation You Don’t Know" analyzed the JVM memory model in detail. In the constant pool section, we learned about three types of constant pools: string constant pool, Class file constant pool, and runtime constant pool. The memory allocation of strings has a great relationship with the string constant pool.
We know that instantiating a string can be achieved in two ways. The first and most commonly used method is through literal assignment, and the other is through the construction method of passing parameters. The code is as follows:
String str1="abc"; String str2=new String("abc");復(fù)制代碼
What is the difference between these two methods in memory allocation? I believe the teacher explained it to us when we first learned Java:
1. Pass Creating a String through literal assignment will only generate a String object in the string constant pool. 2. Passing in the String parameter through the constructor will generate a String object in the heap memory and the string constant pool, and put the reference to the String in the heap memory into the stack.
Is this answer correct? It doesn't seem entirely correct at least for now, as it completely depends on the Java version used. The previous article "Reviewing the past and learning the new - JVM memory allocation you don't know" talked about the HotSpot virtual machine's implementation of the string constant pool on different JDKs. The excerpt is as follows:
Before JDK7, the string constant pool was in the method area (permanent generation). At this time, the constant pool stored string objects. In JDK7, the string constant pool is moved from the method area to the heap memory, and the string object is stored in the Java heap. The string constant pool only stores references to string objects.
How should we understand this sentence? Let’s take String str1=new String("abc") as an example to analyze:
1. Memory allocation in JDK6
Let’s first analyze the memory allocation of JDK6, as shown in the figure below :

When new String("abc") is called, an "abc" object will be generated in the Java heap and the constant pool. At the same time, point str1 to the "abc" object in the heap.
2.Memory allocation in JDK7
In JDK7 and later versions, since the string constant pool is moved to the heap memory, the memory allocation method is also different, as shown in the following figure :

When new String("abc") is called, two "abc" objects will be created in the heap memory, and str1 points to them. An "abc" object, and a reference to the "abc" object will be generated in the constant pool and point to another "abc" object.
As for why Java is designed like this, we have already explained it in the previous article: Because String is the most frequently used data type in Java, in order to save program memory and improve program performance, The designers of Java have opened up a string constant pool area, which is shared by all classes. Each virtual machine has only one string constant pool. Therefore, when using literal assignment, if the string already exists in the string constant pool, the object will not be re-created in the heap memory, but will be pointed directly to the object in the string constant pool.
2. String’s intern() method
After understanding the memory allocation of String, we need to get to know a very important method in String: String.intern() .
Many readers may not know much about this method, but it does not mean that it is not important. Let's first take a look at the source code of the intern() method:
/** * Returns a canonical representation for the string object. * <p> * A pool of strings, initially empty, is maintained privately by the * class {@code String}. * <p> * When the intern method is invoked, if the pool already contains a * string equal to this {@code String} object as determined by * the {@link #equals(Object)} method, then the string from the pool is * returned. Otherwise, this {@code String} object is added to the * pool and a reference to this {@code String} object is returned. * <p> * It follows that for any two strings {@code s} and {@code t}, * {@code s.intern() == t.intern()} is {@code true} * if and only if {@code s.equals(t)} is {@code true}. * <p> * All literal strings and string-valued constant expressions are * interned. String literals are defined in section 3.10.5 of the * <cite>The Java™ Language Specification</cite>. * * @return a string that has the same contents as this string, but is * guaranteed to be from a pool of unique strings. */ public native String intern();復(fù)制代碼
emmmmm.... It's actually a native method, but it doesn't matter. Even if we can't see the source code, we can get some information from its comments: When calling the intern method, if the string constant pool already contains a string equal to the String object, the reference to the string in the string constant pool is directly returned. Otherwise, the string contained by this string object is added to the constant pool and a reference to this object is returned.
1.一個關(guān)于intern()的簡單例子
了解了intern方法的用途之后,來看一個簡單的列子:
public class Test { public static void main(String[] args) { String str1 = "hello world"; String str2 = new String("hello world"); String str3=str2.intern(); System.out.println("str1 == str2:"+(str1 == str2)); System.out.println("str1 == str3:"+(str1 == str3)); } }復(fù)制代碼
上面的一段代碼會輸出什么?編譯運(yùn)行之后如下:

如果理解了intern方法就很容易解釋這個結(jié)果了,從上面截圖中可以看到,我們的運(yùn)行環(huán)境是JDK8。
String str1 = "hello world"; 這行代碼會首先在Java堆中創(chuàng)建一個對象,并將該對象的引用放入字符串常量池中,str1指向常量池中的引用。
String str2 = new String("hello world");這行代碼會通過new來實例化一個String對象,并將該對象的引用賦值給str2,然后檢測字符串常量池中是否已經(jīng)有了與“hello world”相等的對象,如果沒有,則會在堆內(nèi)存中再生成一個值為"hello world"的對象,并將其引用放入到字符串常量池中,否則,不會再去創(chuàng)建。這里,第一行代碼其實已經(jīng)在字符串常量池中保存了“hello world”字符串對象的引用,因此,第二行代碼就不會再次向常量池中添加“hello world"的引用。
String str3=str2.intern(); 這行代碼會首先去檢測字符串常量池中是否已經(jīng)包含了”hello world"的String對象,如果有則直接返回其引用。而在這里,str2.intern()其實剛好返回了第一行代碼中生成的“hello world"對象。
因此【System.out.println("str1 == str3:"+(str1 == str3));】這行代碼會輸出true.
如果切到JDK6,其打印結(jié)果與上一致,至于原因讀者可以自行分析。

2.改造例子,再看intern
上一節(jié)中我們通過一個例子認(rèn)識了intern()方法的作用,接下來,我們對上述例子做一些修改:
public class Test { public static void main(String[] args) { String str1=new String("he")+new String("llo"); String str2=str1.intern(); String str3="hello"; System.out.println("str1 == str2:"+(str1 == str2)); System.out.println("str2 == str3:"+(str2 == str3)); } }復(fù)制代碼
先別急著看下方答案,思考一下在JDK7(或JDK7之后)及JDK6上會輸出什么結(jié)果?
1).JDK8的運(yùn)行結(jié)果分析
我們先來看下我們先來看下JDK8的運(yùn)行結(jié)果:

通過運(yùn)行程序發(fā)現(xiàn)輸出的兩個結(jié)果都是true,這是為什么呢?我們通過一個圖來分析:

String str1=new String("he")+new String("llo"); 這行代碼中new String("he")和new String("llo")會在堆上生成四個對象,因為與本例無關(guān),所以圖上沒有畫出,new String("he")+new String("llo")通過”+“號拼接后最終會生成一個"hello"對象并賦值給str1。
String str2=str1.intern(); 這行代碼會首先檢測字符串常量池,發(fā)現(xiàn)此時還沒有存在與”hello"相等的字符串對象的引用,而在檢測堆內(nèi)存時發(fā)現(xiàn)堆中已經(jīng)有了“hello"對象,遂將堆中的”hello"對象的應(yīng)用放入字符串常量池中。
String str3="hello"; 這行代碼發(fā)現(xiàn)字符串常量池中已經(jīng)存在了“hello"對象的引用,因此將str3指向了字符串常量池中的引用。
此時,我們發(fā)現(xiàn)str1、str2、str3指向了堆中的同一個”hello"對象,因此,就有了上邊兩個均為true的輸出結(jié)果。
2).JDK6的運(yùn)行結(jié)果分析
我們將運(yùn)行環(huán)境切換到JDK6,來看下其輸出結(jié)果:

有點意思!相同的代碼在不同的JDK版本上輸出結(jié)果竟然不相等。這是怎么回事呢?我們還通過一張圖來分析:

String str1=new String("he")+new String("llo"); 這行代碼會通過new String("he")和new String("llo")會分別在Java堆與字符串常量池中各生成兩個String對象,由于與本例無關(guān),所以并沒有在圖中畫出。而new String("he")+new String("llo")通過“+”號拼接后最終會在Java堆上生成一個"hello"對象,并將其賦值給了str1。
String str2=str1.intern(); 這行代碼檢測到字符串常量池中還沒有“hello"對象,因此將堆中的”hello“對象復(fù)制到了字符串常量池,并將其賦值給str2。
String str3="hello"; 這行代碼檢測到字符串常量池中已經(jīng)有了”hello“對象,因此直接將str3指向了字符串常量池中的”hello“對象。 此時str1指向的是Java堆中的”hello“對象,而str2和str3均指向了字符串常量池中的對象。因此,有了上面的輸出結(jié)果。
通過這兩個例子,相信大家因該對String的intern()方法有了較深的認(rèn)識。那么intern()方法具體在開發(fā)中有什么用呢?推薦大家可以看下美團(tuán)技術(shù)團(tuán)隊的一篇文章《深入解析String#intern》中舉的兩個例子。限于篇幅,本文不再舉例分析。
三、String類的結(jié)構(gòu)及特性分析
前兩節(jié)我們認(rèn)識了String的內(nèi)存分配以及它的intern()方法,這兩節(jié)內(nèi)容其實都是對String內(nèi)存的分析。到目前為止,我們還并未認(rèn)識String類的結(jié)構(gòu)以及它的一些特性。那么本節(jié)內(nèi)容我們就此來分析。先通過一段代碼來大致了解一下String類的結(jié)構(gòu)(代碼取自jdk8):
public final class String implements java.io.Serializable, Comparable<String>, CharSequence { /** The value is used for character storage. */ private final char value[]; /** Cache the hash code for the string */ private int hash; // Default to 0 // ...}復(fù)制代碼
可以看到String類實現(xiàn)了Serializable接口、Comparable接口以及CharSequence接口,意味著它可以被序列化,同時方便我們排序。另外,String類還被聲明為了final類型,這意味著String類是不能被繼承的。而在其內(nèi)部維護(hù)了一個char數(shù)組,說明String是通過char數(shù)組來實現(xiàn)的,同時我們注意到這個char數(shù)組也被聲明為了final,這也是我們常說的String是不可變的原因。
1.不同JDK版本之間String的差異
Java的設(shè)計團(tuán)隊一直在對String類進(jìn)行優(yōu)化,這就導(dǎo)致了不同jdk版本上String類的實現(xiàn)有些許差異,只是我們使用上并無感知。下圖列出了jdk6-jdk9中String源碼的一些變化。

可以看到在Java6之前String中維護(hù)了一個char 數(shù)組、一個偏移量 offset、一個字符數(shù)量 count以及一個哈希值 hash。 String對象是通過 offset 和 count 兩個屬性來定位 char[] 數(shù)組,獲取字符串。這么做可以高效、快速地共享數(shù)組對象,同時節(jié)省內(nèi)存空間,但這種方式很有可能會導(dǎo)致內(nèi)存泄漏。
在Java7和Java8的版本中移除了 offset 和 count 兩個變量了。這樣的好處是String對象占用的內(nèi)存稍微少了些,同時 String.substring 方法也不再共享 char[],從而解決了使用該方法可能導(dǎo)致的內(nèi)存泄漏問題。
從Java9開始,String中的char數(shù)組被byte[]數(shù)組所替代。我們知道一個char類型占用兩個字節(jié),而byte占用一個字節(jié)。因此在存儲單字節(jié)的String時,使用char數(shù)組會比byte數(shù)組少一個字節(jié),但本質(zhì)上并無任何差別。 另外,注意到在Java9的版本中多了一個coder,它是編碼格式的標(biāo)識,在計算字符串長度或者調(diào)用 indexOf() 函數(shù)時,需要根據(jù)這個字段,判斷如何計算字符串長度。coder 屬性默認(rèn)有 0 和 1 兩個值, 0 代表Latin-1(單字節(jié)編碼),1 代表 UTF-16 編碼。如果 String判斷字符串只包含了 Latin-1,則 coder 屬性值為 0 ,反之則為 1。
2.String字符串的裁剪、拼接等操作分析
在本節(jié)內(nèi)容的開頭我們已經(jīng)知道了字符串的不可變性。那么為什么我們還可以使用String的substring方法進(jìn)行裁剪,甚至可以直接使用”+“連接符進(jìn)行字符串的拼接呢?
(1)String的substring實現(xiàn)
關(guān)于substring的實現(xiàn),其實我們直接深入String的源碼查看即可,源碼如下:
public String substring(int beginIndex) { if (beginIndex < 0) { throw new StringIndexOutOfBoundsException(beginIndex); } int subLen = value.length - beginIndex; if (subLen < 0) { throw new StringIndexOutOfBoundsException(subLen); } return (beginIndex == 0) ? this : new String(value, beginIndex, subLen); }復(fù)制代碼
從這段代碼中可以看出,其實字符串的裁剪是通過實例化了一個新的String對象來實現(xiàn)的。所以,如果在項目中存在大量的字符串裁剪的代碼應(yīng)盡量避免使用String,而是使用性能更好的StringBuilder或StringBuffer來處理。
(2)String的字符串拼接實現(xiàn)
1)字符串拼接方案性能對比
關(guān)于字符串的拼接有很多實現(xiàn)方法,在這里我們舉三個例子來進(jìn)行一個性能對比,分別如下:
使用”+“操作符拼接字符串
public class Test { private static final int COUNT=50000; public static void main(String[] args) { String str=""; for(int i=0;i<COUNT;i++) { str=str+"abc"; } }復(fù)制代碼
使用String的concat()方法拼接
public class Test { private static final int COUNT=50000; public static void main(String[] args) { String str=""; for(int i=0;i<COUNT;i++) { str=str+"abc"; } }復(fù)制代碼
使用StringBuilder的append方法拼接
public class Test { private static final int COUNT=50000; public static void main(String[] args) { StringBuilder str=new StringBuilder(); for(int i=0;i<COUNT;i++) { str.append("abc"); } }復(fù)制代碼
如上代碼,通過三種方法分別進(jìn)行了50000次字符串拼接,每種方法分別運(yùn)行了20次。統(tǒng)計耗時,得到以下表格:
拼接方法 | 最小用時(ms) | 最大用時(ms) | 平均用時(ms) |
---|---|---|---|
"+"操作符 | 4868 | 5146 | 4924 |
String的concat方法 | 2227 | 2456 | 2296 |
StringBuilder的append方法 | 4 | 12 | 6.6 |
從以上數(shù)據(jù)中可以很直觀的看到”+“操作符的性能是最差的,平均用時達(dá)到了4924ms。其次是String的concat方法,平均用時也在2296ms。而表現(xiàn)最為優(yōu)秀的是StringBuilder的append方法,它的平均用時竟然只有6.6ms。這也是為什么在開發(fā)中不建議使用”+“操作符進(jìn)行字符串拼接的原因。
2)三種字符串拼接方案原理分析
”+“操作符的實現(xiàn)原理由于”+“操作符是由JVM來完成的,我么無法直接看到代碼實現(xiàn)。不過Java為我們提供了一個javap的工具,可以幫助我們將Class文件進(jìn)行一個反匯編,通過匯編指令,大致可以看出”+“操作符的實現(xiàn)原理。
public class Test { private static final int COUNT=50000; public static void main(String[] args) { for(int i=0;i<COUNT;i++) { str=str+"abc"; } }復(fù)制代碼
把上邊這段代碼編譯后,執(zhí)行javap,得到如下結(jié)果:

注意圖中的”11:“行指令處實例化了一個StringBuilder,在"19:"行處調(diào)用了StringBuilder的append方法,并在第”27:"行處調(diào)用了String的toString()方法??梢?,JVM在進(jìn)行”+“字符串拼接時也是用了StringBuilder來實現(xiàn)的,但為什么與直接使用StringBuilder的差距那么大呢?其實,只要我們將上邊代碼轉(zhuǎn)換成虛擬機(jī)優(yōu)化后的代碼一看便知:
public class Test { private static final int COUNT=50000; public static void main(String[] args) { String str=""; for(int i=0;i<COUNT;i++) { str=new StringBuilder(str).append("abc").toString(); } }復(fù)制代碼
可見,優(yōu)化后的代碼雖然也是用的StringBuilder,但是StringBuilder卻是在循環(huán)中實例化的,這就意味著循環(huán)了50000次,創(chuàng)建了50000個StringBuilder對象,并且調(diào)用了50000次toString()方法。怪不得用了這么長時間?。?!
String的concat方法的實現(xiàn)原理關(guān)于concat方法可以直接到String內(nèi)部查看其源碼,如下:
public String concat(String str) { int otherLen = str.length(); if (otherLen == 0) { return this; } int len = value.length; char buf[] = Arrays.copyOf(value, len + otherLen); str.getChars(buf, len); return new String(buf, true); }復(fù)制代碼
可以看到,在concat方法中使用Arrays的copyOf進(jìn)行了一次數(shù)組拷貝,接下來又通過getChars方法再次進(jìn)行了數(shù)組拷貝,最后通過new實例化了String對象并返回。這也意味著每調(diào)用一次concat都會生成一個String對象,但相比”+“操作符卻省去了toString方法。因此,其性能要比”+“操作符好上不少。
至于StringBuilder其實也沒必要再去分析了,畢竟”+“操作符也是基于StringBuilder實現(xiàn)的,只不過拼接過程中”+“操作符創(chuàng)建了大量的對象。而StringBuilder拼接時僅僅創(chuàng)建了一個StringBuilder對象。
四、總結(jié)
本篇文章我們深入分析了String字符串的內(nèi)存分配、intern()方法,以及String類的結(jié)構(gòu)及特性。關(guān)于這塊知識,網(wǎng)上的文章魚龍混雜,甚至眾說紛紜。筆者也是參考了大量的文章并結(jié)合自己的理解來做的分析。但是,避免不了的可能會出現(xiàn)理解偏差的問題,如果有,希望大家多多討論給予指正。 同時,文章中多次提到StringBuilder,但限于文章篇幅,沒能給出關(guān)于其詳細(xì)分析。不過不用擔(dān)心,我會在下一篇文章中再做探討。 不管怎樣,相信大家看完這篇文章后一定 對String有了更加深入的認(rèn)識,尤其是了解String類的一些裁剪及拼接中可能造成的性能問題,在今后的開發(fā)中應(yīng)該盡量避免。
The above is the detailed content of Review the past and learn the new (1) In-depth understanding of strings in Java. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Comments cannot be careless because they want to explain the reasons for the existence of the code rather than the functions, such as compatibility with old interfaces or third-party restrictions, otherwise people who read the code can only rely on guessing. The areas that must be commented include complex conditional judgments, special error handling logic, and temporary bypass restrictions. A more practical way to write comments is to select single-line comments or block comments based on the scene. Use document block comments to explain parameters and return values at the beginning of functions, classes, and files, and keep comments updated. For complex logic, you can add a line to the previous one to summarize the overall intention. At the same time, do not use comments to seal code, but use version control tools.

The key to writing PHP comments is to clarify the purpose and specifications. Comments should explain "why" rather than "what was done", avoiding redundancy or too simplicity. 1. Use a unified format, such as docblock (/*/) for class and method descriptions to improve readability and tool compatibility; 2. Emphasize the reasons behind the logic, such as why JS jumps need to be output manually; 3. Add an overview description before complex code, describe the process in steps, and help understand the overall idea; 4. Use TODO and FIXME rationally to mark to-do items and problems to facilitate subsequent tracking and collaboration. Good annotations can reduce communication costs and improve code maintenance efficiency.

The key to writing good comments is to explain "why" rather than just "what was done" to improve the readability of the code. 1. Comments should explain logical reasons, such as considerations behind value selection or processing; 2. Use paragraph annotations for complex logic to summarize the overall idea of functions or algorithms; 3. Regularly maintain comments to ensure consistency with the code, avoid misleading, and delete outdated content if necessary; 4. Synchronously check comments when reviewing the code, and record public logic through documents to reduce the burden of code comments.

The key to writing PHP comments is clear, useful and concise. 1. Comments should explain the intention behind the code rather than just describing the code itself, such as explaining the logical purpose of complex conditional judgments; 2. Add comments to key scenarios such as magic values, old code compatibility, API interfaces, etc. to improve readability; 3. Avoid duplicate code content, keep it concise and specific, and use standard formats such as PHPDoc; 4. Comments should be updated synchronously with the code to ensure accuracy. Good comments should be thought from the perspective of others, reduce the cost of understanding, and become a code understanding navigation device.

PHP variables start with $, and the naming must follow rules, such as they cannot start with numbers and are case sensitive; the scope of the variable is divided into local, global and hyperglobal; global variables can be accessed using global, but it is recommended to pass them with parameters; mutable variables and reference assignments should be used with caution. Variables are the basis for storing data, and correctly mastering their rules and mechanisms is crucial to development.

The first step is to select the integrated environment package XAMPP or MAMP to build a local server; the second step is to select the appropriate PHP version according to the project needs and configure multiple version switching; the third step is to select VSCode or PhpStorm as the editor and debug with Xdebug; in addition, you need to install Composer, PHP_CodeSniffer, PHPUnit and other tools to assist in development.

There are three common ways to use PHP comments: single-line comments are suitable for briefly explaining code logic, such as // or # for the explanation of the current line; multi-line comments /*...*/ are suitable for detailed description of the functions or classes; document comments DocBlock start with /** to provide prompt information for the IDE. When using it, you should avoid nonsense, keep updating synchronously, and do not use comments to block codes for a long time.

PHP has 8 variable types, commonly used include Integer, Float, String, Boolean, Array, Object, NULL and Resource. To view variable types, use the gettype() or is_type() series functions. PHP will automatically convert types, but it is recommended to use === to strictly compare the key logic. Manual conversion can be used for syntax such as (int), (string), etc., but be careful that information may be lost.
