Introduction to Linux and Python

Key Points

Introducing the Shell
  • A shell is a program whose primary purpose is to read commands and run other programs.

  • The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.

  • The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be.

Navigating Files and Directories
  • The file system is responsible for managing information on the disk.

  • Information is stored in files, which are stored in directories (folders).

  • Directories can also store other directories, which forms a directory tree.

  • cd path changes the current working directory.

  • ls path prints a listing of a specific file or directory; ls on its own lists the current working directory.

  • pwd prints the user’s current working directory.

  • whoami shows the user’s current identity.

  • / on its own is the root directory of the whole file system.

  • A relative path specifies a location starting from the current location.

  • An absolute path specifies a location from the root of the file system.

  • Directory names in a path are separated with ‘/’ on Unix, but ‘\’ on Windows.

  • ’..’ means ‘the directory above the current one’; ‘.’ on its own means ‘the current directory’.

  • Most files’ names are something.extension. The extension isn’t required, and doesn’t guarantee anything, but is normally used to indicate the type of data in the file.

  • Most commands take options (flags) which begin with a ‘-‘.

Working With Files and Directories
  • cp old new copies a file.

  • mkdir path creates a new directory.

  • mv old new moves (renames) a file or directory.

  • rm path removes (deletes) a file.

  • Use of the Control key may be described in many ways, including Ctrl-X, Control-X, and ^X.

  • The shell does not have a trash bin: once something is deleted, it’s really gone.

  • Depending on the type of work you do, you may need a more powerful text editor than Nano.

Pipes and Filters
  • cat displays the contents of its inputs.

  • head displays the first few lines of its input.

  • tail displays the last few lines of its input.

  • sort sorts its inputs.

  • wc counts lines, words, and characters in its inputs.

  • * matches zero or more characters in a filename, so *.txt matches all files ending in .txt.

  • ? matches any single character in a filename, so ?.txt matches a.txt but not any.txt.

  • command > file redirects a command’s output to a file.

  • first | second is a pipeline: the output of the first command is used as the input to the second.

  • The best way to use the shell is to use pipes to combine simple single-purpose programs (filters).

Loops
  • A for loop repeats commands once for every thing in a list.

  • Every for loop needs a variable to refer to the thing it is currently operating on.

  • Use $name to expand a variable (i.e., get its value). ${name} can also be used.

  • Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion.

  • Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping.

  • Use the up-arrow key to scroll up through previous commands to edit and repeat them.

  • Use Ctrl-R to search through the previously entered commands.

  • Use history to display recent commands, and !number to repeat a command by number.

Shell Scripts
  • Save commands in files (usually called shell scripts) for re-use.

  • bash filename runs the commands saved in a file.

  • $@ refers to all of a shell script’s command-line parameters.

  • $1, $2, etc., refer to the first command-line parameter, the second command-line parameter, etc.

  • Place variables in quotes if the values might have spaces in them.

  • Letting users decide what files to process is more flexible and more consistent with built-in Unix commands.

Finding Things
  • find finds files with specific properties that match patterns.

  • grep selects lines in files that match patterns.

  • --help is a flag supported by many bash commands, and programs that can be run from within Bash, to display more information on how to use these commands or programs.

  • man command displays the manual page for a given command.

  • $(command) inserts a command’s output in place.

Running and Quitting
  • python, ipython and idle commands all give an interactive Python shell (REPL).

  • Python programs are plain text files.

  • You can use IDLE for creating and running Python programs.

Variables and Assignment
  • Use variables to store values.

  • Use print to display values.

  • Variables must be created before they are used.

  • Variables can be used in calculations.

  • Use an index to get a single character from a string.

  • Use a slice to get a substring.

  • Use the built-in function len to find the length of a string.

  • Python is case-sensitive.

  • Use meaningful variable names.

Data Types and Type Conversion
  • Every value has a type.

  • Use the built-in function type to find the type of a value.

  • Types control what operations can be done on values.

  • Strings can be added and multiplied.

  • Strings have a length (but numbers don’t).

  • Must convert numbers to strings or vice versa when operating on them.

  • Can mix integers and floats freely in operations.

  • Variables only change value when something is assigned to them.

Built-in Functions and Help
  • Use comments to add documentation to programs.

  • A function may take zero or more arguments.

  • Commonly-used built-in functions include max, min, and round.

  • Functions may only work for certain (combinations of) arguments.

  • Functions may have default values for some arguments.

  • Use the built-in function help to get help for a function.

  • Every function returns something.

  • Python reports a syntax error when it can’t understand the source of a program.

  • Python reports a runtime error when something goes wrong while a program is executing.

  • Fix syntax errors by reading the source code, and runtime errors by tracing the program’s execution.

Libraries
  • Most of the power of a programming language is in its libraries.

  • A program must import a library module in order to use it.

  • Use help to learn about the contents of a library module.

  • Import specific items from a library to shorten programs.

  • Create an alias for a library when importing it to shorten programs.

Reading Tabular Data into DataFrames
  • Use the Pandas library to do statistics on tabular data.

  • Use index_col to specify that a column’s values should be used as row headings.

  • Use DataFrame.info to find out more about a dataframe.

  • The DataFrame.columns variable stores information about the dataframe’s columns.

  • Use DataFrame.T to transpose a dataframe.

  • Use DataFrame.describe to get summary statistics about data.

Pandas DataFrames
  • Use DataFrame.iloc[..., ...] to select values by integer location.

  • Use : on its own to mean all columns or all rows.

  • Select multiple columns or rows using DataFrame.loc and a named slice.

  • Result of slicing can be used in further operations.

  • Use comparisons to select data based on value.

  • Select values or NaN using a Boolean mask.

Plotting
  • matplotlib is the most widely used scientific plotting library in Python.

  • Plot data directly from a Pandas dataframe.

  • Select and transform data, then plot it.

  • Many styles of plot are available.

  • Can plot many sets of data together.

Lists
  • A list stores many values in a single structure.

  • Use an item’s index to fetch it from a list.

  • Lists’ values can be replaced by assigning to them.

  • Appending items to a list lengthens it.

  • Use del to remove items from a list entirely.

  • The empty list contains no values.

  • Lists may contain values of different types.

  • Character strings can be indexed like lists.

  • Character strings are immutable.

  • Indexing beyond the end of the collection is an error.

For Loops
  • A for loop executes commands once for each value in a collection.

  • The first line of the for loop must end with a colon, and the body must be indented.

  • Indentation is always meaningful in Python.

  • A for loop is made up of a collection, a loop variable, and a body.

  • Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).

  • The body of a loop can contain many statements.

  • Use range to iterate over a sequence of numbers.

  • The Accumulator pattern turns many values into one.

Looping Over Data Sets
  • Use a for loop to process files given a list of their names.

  • Use glob.glob to find sets of files whose names match a pattern.

  • Use glob and for to process batches of files.

Writing Functions
  • Break programs down into functions to make them easier to understand.

  • Define a function using def with a name, parameters, and a block of code.

  • Defining a function does not run it.

  • Arguments in call are matched to parameters in definition.

  • Functions may return a result to their caller using return.

Variable Scope
  • The scope of a variable is the part of a program that can ‘see’ that variable.

Conditionals
  • Use if statements to control whether or not a block of code is executed.

  • Conditionals are often used inside loops.

  • Use else to execute a block of code when an if condition is not true.

  • Use elif to specify additional tests.

  • Conditions are tested once, in order.

  • Create a table showing variables’ values to trace a program’s execution.

Programming Style
  • Follow standard Python style in your code.

  • Use docstrings to provide online help.

Wrap-Up
  • Python supports a large community within and outwith research.

Feedback
  • We are constantly seeking to improve this course.