Python
Wherever possible you should use Python3
, unless there is a compelling reason not to (e.g. you need to use a package that is only available for Python2
)
Structure
For a useful summary of best practice when structuring a project see the Hitchhikers Guide to Python.
requirements.txt
All python projects should have a requirements.txt
file in the root directory.
The pipreqs
package lets you generate these automatically given a folder of .py
files.
pandas==0.21.0
pytest==3.2.5
textract==1.6.1
more_itertools==3.2.0
numpy==1.13.3
Parameters
Avoid hard coded parameters wherever possible: use a separate file (parameters.py
or param.json
or some other single place to set params you're likely to change).
Detailing the parameters as a table in README.md
can also be useful.
Notebooks (.ipynb)
Please see the section on Jupyter Notebooks if you use them.
Code
These shouldn't be too prescriptive or limiting - the intention is to set a standard to enable others who use your code to get started quickly.
Style
The Hitchhikers Guide has a comprehensive style guide for best practice, but this is a bit prescriptive. To keep us on the same page for collaboration code should:
- Confirm to PEP8 (see
pycodestyle
for linting in text editors andautopep8
orYAPF
for automatic correction) - Use spaces instead of tabs
Docstrings
Even simple functions should have a docstring. What is simple at the time of writing might not be simple to someone who hasn't seen the code before.
def double(x):
'''Double a number'''
return(x * 2)
For complex functions that take a number of parameters the numpy
docstring format is recommended. Text editor plugins can be used to autogenerate much of the docstring. This example is perhaps a bit more than you would need to write - but sets out a nice layout for writing docstrings.
def nanmax(a, axis=None, out=None, keepdims=np._NoValue):
"""
Return the maximum of an array or maximum along an axis, ignoring any
NaNs. When all-NaN slices are encountered a ``RuntimeWarning`` is
raised and NaN is returned for that slice.
Parameters
----------
a : array_like
Array containing numbers whose maximum is desired. If `a` is not an
array, a conversion is attempted.
axis : {int, tuple of int, None}, optional
Axis or axes along which the maximum is computed. The default is to compute
the maximum of the flattened array.
out : ndarray, optional
Alternate output array in which to place the result. The default
is ``None``; if provided, it must have the same shape as the
expected output, but the type will be cast if necessary. See
`doc.ufuncs` for details.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original `a`.
If the value is anything but the default, then
`keepdims` will be passed through to the `max` method
of sub-classes of `ndarray`. If the sub-classes methods
does not implement `keepdims` any exceptions will be raised.
Returns
-------
nanmax : ndarray
An array with the same shape as `a`, with the specified axis removed.
If `a` is a 0-d array, or if axis is None, an ndarray scalar is
returned. The same dtype as `a` is returned.
Notes
-----
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
Positive infinity is treated as a very large number and negative
infinity is treated as a very small (i.e. negative) number.
If the input has a integer type the function is equivalent to np.max.
Examples
--------
>>> a = np.array([[1, 2], [3, np.nan]])
>>> np.nanmax(a)
3.0
"""