Parameterization and string substitution
String substitution
In the context of building a database query like so:
CREATE TABLE fish (name TEXT, species TEXT, tank_number INTEGER)
you would likely want to include for example user input for say a search query into a database. There are two ways to go about it and the first way to build a query is via string substitution and this is the bad way.
Say the user input is in the variable name
and the user can put whatever name they like including special characters. We have our search query to find the particular name
that the user inputted as follows:
query = f"SELECT * from user_table WHERE name='{name}'"
Using f-string we are interpolation the variable with the query and this give the chance for SQL Injection attack to occur. This is because the user can specify something like name = "Tom' OR 1=1"
as it's input and when the variable is used for interpolation for building the query it results into:
name = "Tom' OR 1 = 1"
"SELECT * from user_table WHERE name='{name}'"
"SELECT * from user_table WHERE name='Tom' OR 1 = 1"
And when the query is executed it will pick everyone from the database and retrieve their information (possibly private information).
Parameterized queries
Now for a much safer approach to building query is via parameterized queries. In this case, the variables that are used to build the query are pass as parameters and not used directly in a string interpolation. How you do this will depend on the library that you use, for example, in Python when you execute a query you can pass a parameterized string along with the parameters to build a parameterized queries.
The idea is that if the user input are passed as parameters, they no longer have the chance to mess with the query since there are no interpolation, the user input is not used to build a query but rather as a parameter. The variables themselves will no long be used as part of an executable code but rather treated as literal values.
query = "SELECT * from user_table WHERE name=?"
params = (name)
cursor.execute(query, params)
Now no matter what the user input, even Tom' OR 1 = 1
as name
it will be treated literally, as you are looking for a person named Tom' OR 1 = 1
in the database.