Current location - Training Enrollment Network - Books and materials - Python For loop tutorial on big data analysis
Python For loop tutorial on big data analysis
Big Data Analysis Python has many other functions besides traversing lists. In the real world data science work, you may need to use numpy array and pandasDataFrames to loop other data structures.

PythonFor loop big data analysis tutorial begins with how to use Python data structures (such as tuples and dictionaries) of common big data outside the list For loop analysis. Then, we will discuss the use of for loop numpy, pandas and matplotlib in series with Python data science library of general big data analysis. We will also carefully study the range () function and its role in the writing cycle.

Quick review: big data analysis PythonFor loop.

A for loop is a programming statement that tells big data analytics Python to traverse a collection of objects and perform the same operation on each object in turn.

Every time Big Data Analysis Python traverses the loop, the variable object will adopt the value collection_of_objects of the next object in the sequence, and Big Data Analysis Python will execute the code collection_of_objects we wrote on each object in turn.

Now, let's delve into how to use loops in different types of data structures. We will skip these lists because they were introduced in the last tutorial. If you need to check further, please check the introductory tutorial or interactive task of Dataquest in the list and loop.

data structure

tuple

A tuple is a sequence, just like a list. The difference between tuples and lists is that tuples are immutable. In other words, they are immutable (learn more about mutable and immutable objects in big data analysis in Python). Tuples also use parentheses instead of square brackets.

Regardless of these differences, loops on tuples are very similar to lists.

If we have a list of tuples, we can access the elements in each tuple in the list by including them as variables in the for loop.

vocabulary

In addition to lists and tuples, dictionaries are another common Python data type of big data analysis that you may encounter when processing data, and for loops can also traverse dictionaries.

Python dictionaries for big data analysis are composed of key-value pairs, so we need to access two elements (keys and values) in each loop. Instead of enumerate () as a list, we should iterate over two keys and the corresponding values of each key-value pair. We need to call this function. Items () method.

For example, suppose we have a stock dictionary called Dictionary, which contains an automatic stock quotation recorder and the corresponding stock prices. We will use. The methods in the items () dictionary generate keys and values for each iteration.

Note that name keys and values are completely arbitrary. We can also label them k and v or x and y.

string music

As described in the introductory tutorial, the for loop can also traverse each character in the string. A quick review of how this works:

Digital array

Now, let's take a look at the common Python data science packages for big data analysis and how their data types use the for loop.

We will start by studying how to use numpy array loops, so let's start by creating some random number arrays.

Iterating on a one-dimensional numpy array is very similar to iterating on a list.

Now, what if we want to traverse a two-dimensional array? If we use the same syntax as above to iterate over a two-dimensional array, we can only iterate over the whole array at each iteration.

A two-dimensional array consists of one-to-one one-dimensional arrays. To access every element instead of every array, we can use the numpy function nditer (), which is a multi-dimensional iterator object and takes an array as a parameter.

In the following code, we will write a for loop that passes a z-dimensional array as a parameter to each element nditer ().

As we can see, it lists all the elements in X first, and then all the elements in Y. ..

Remember! When traversing these different data structures, the dictionary needs a method and the numpy array needs a function.

Panda data frame

PandasDataFrames are often used when we use big data analysis Python to process data. Fortunately, we can also use for loops to traverse these loops.

Let's practice using a small CSV file to record the GDP, capital and population of six different countries. We will read it into pandasDataFrame below.

Panda works differently from numpy, so we can't simply repeat the numpy process we have learned. If we try to traverse pandasDataFrame as we traverse numpy arrays, only the column names will be printed:

Instead, we need to explicitly mention the lines we want to traverse the data frame. To this end, we iterrows () call the method on the DataFrame to print row labels and row data, one of which is the whole panda series.

We can also access specific values of the panda series. Suppose we just want to print out the capital of each country. We can specify to output only from the "Capital" column.

To get things beyond simple printouts, let's use the for loop to add a column. We add the column of "GDP per capita". Please remember, this. Loc [] is based on tags. In the following code, we will add this column and calculate the content of each country by dividing the total GDP of each country by its population, and then multiplying the result by one trillion (because GDP figures are in trillions).

For each row in the data frame, we will create a new label, and set the row data equal to the total GDP divided by the population of the country, and then multiply it by $ $ 1T to get thousands of dollars.

Range () function

We have seen how to use the for loop to iterate over any sequence or data structure. But what if we want to iterate these sequences in a specific order or a specific number of times?

This can be achieved by big data analysis Python's built-in range () function. Depending on the number of parameters passed to the function, you can determine the starting and ending positions of the number sequence, as well as the difference between one number and the next. PythonFor Big Data Analysis Loop Tutorial Please note that, similar to a list, the count of the range () function starts from 0 instead of 1.

We can call range () in three ways:

A. Scope (Stop)

B. Scope (start, stop)

C. Range (start, stop, step)

Range (stop)

Range(stop) has a parameter, which is used when we want to iterate over a series of numbers starting from 0. It includes all the numbers starting from 0, but does not include the numbers we set to stop.

Range (start, stop)

The range (start, stop) has two parameters. We can not only set the end point of the sequence, but also set the starting point. You can use range () to generate a series of numbers from a to b through range(A, b).

Range (start, stop, step)

The range (start, stop, step) has three parameters. In addition to the minimum and maximum values, we can also set the difference between one number and the next in the sequence. If not provided, the default step value is 1.

Note that this is also true for non-numeric sequences.

We can also use the index of the elements in the sequence to iterate. The key idea is to calculate the length of the list first, and then iterate the sequence within this length range. Let's look at an example:

In the for loop above, we look at the index and language of variables, the in keyword and the function range () to create a number sequence. Note that we also use the len () function because the list is not a number.

For each iteration, we are executing printed statements. Therefore, we want to print a language for each index in the range of len(languages). Because the length of our language sequence is 6 (that is, the calculated len(langauges) value), we can rewrite this statement in the following way:

Draw using For loop

Suppose we want to traverse a set and generate a subgraph with each element, or even every trace in a single graph. For example, let's use the popular iris dataset (learn more about it) and do some drawing using the for loop. Consider the picture below.

Above, we have drawn a diagram of the relationship between the length of each sepal and the width of each sepal, but we can give the diagram more meaning by classifying each flower at each data point. One way is to expand each point separately and pass in the corresponding color using the for loop.

What if we want to visualize the univariate distribution of some features of iris dataset? We can use plt to perform this operation. Subplot (), which can create a subgraph in the grid and set its number of columns and rows.

Not involving the syntax of matplotlib for the time being, the following is a brief description of each main component of the chart:

1) factory. plot ()–used to create a 2×2 grid and set the overall size.

2)Zip()- This is a built-in Python function for big data analysis, which can easily traverse multiple iterative objects with the same length at the same time.

3) Axis. Flat (), where flat () is a numpy array method-this will return a flattened version of the array (column).

4) axe. Set()- allows us axes to set all properties of an object in one method.

Additional operation

nested loop

Big Data Analysis Python allows us to use a loop in another loop. This involves an outer loop, and there is an inner loop in its command.

Consider the following structure:

Nested for loops are useful for iterating between items in a list. In a list composed of lists, if we only use the for loop, the program will output each internal list as an item:

To access each item in the internal list, we define a nested for loop:

At the top, the outer for loop traverses the main list (in this case, there are two lists), and the inner for loop traverses each list itself. The outer loop executes 2 iterations (for each sublist), and in each iteration, we execute the inner loop and print all the elements of the corresponding sublist.

This tells us that the control starts from the outermost loop, traverses the inner loop, and then returns to the outer for loop again until the control covers the whole range, in this case, the range is twice.

Go on, break the cycle

The loop control statement changed the normal execution order of the for loop.

What if we want to filter out specific languages in the inner loop? We can do this by using the continue statement, which allows us to skip a specific part of the loop when an external condition is triggered.

In the above loop, in the inner loop, if the language is equal to "German", we will skip the iteration and continue the rest of the loop. The cycle will not end.

Let's look at the following numerical examples:

Therefore, here, we define a loop that traverses all the numbers 0 to 9 and squares each number. In the loop, in each iteration, we have to check whether this number is divisible by 2. At this time, the loop will continue to execute, and when I find an even number, I will skip the iteration.

How about a rest statement? This allows us to exit the loop completely when the external conditions are met. Let's use the same example as above to simply demonstrate how it works:

In the above example, our if statement puts forward the following conditions: if the value of our variable I is equal to 7, the loop will be interrupted, so our loop will iterate over integers between 0 and 6, and then exit the loop completely.

Want more? Here are some other resources that may be useful:

1) Python tutorial for big data analysis-our ever-expanding Python tutorial list for big data analysis in data science.

2) Data Science Course-Data Science and Statistics course will take your study to a new level through completely interactive programming directly in the browser.

conclusion

In the PythonFor loop tutorial of big data analysis, we learned some more advanced applications of the For loop and how to use them in the typical Python data science workflow of big data analysis.

We learned how to iterate over different types of data structures and how to programmatically create multiple traces or subgraphs using pandasDataFrames and matplotlib loops.

Finally, we have studied some advanced technologies that enable us to better control the operation and execution of the for loop.