Source Edit

Imports

dataframe, value, column

Procs

proc parseCsvString(csvData: string; sep: char = ','; header: string = ""; skipLines = 0; toSkip: set[char] = {}; colNames: seq[string] = @[]; skipInitialSpace = true; quote = '\"'; maxGuesses = 20; lineBreak = '\n'; eat = '\r'): DataFrame {. ...raises: [IOError, ValueError], tags: [].}: Parses a DataFrame from a string containing CSV data.

toSkip can be used to skip optional characters that may be present in the data. For instance if a CSV file is separated by ,, but contains additional whitespace (5, 10, 8 instead of 5,10,8) this can be parsed correctly by setting toSkip = {' '}.

header designates the symbol that defines the header of the CSV file. By default it's empty meaning that the first line will be treated as the header. If a header is given, e.g. "#", this means we will determine the column names from the first line (which has to start with #) and skip every line until the first line starting without #.

skipLines is used to skip N number of lines at the beginning of the file.

maxGuesses is the maximum number of rows to look at before we give up trying to determine the datatype of the column and set it to 'object'.

lineBreak is the character used to detect if a new line starts. eat on the other hand is simply ignore. For unix style line endings the defaults are fine. In principle for windows style endings \r\n the defaults should work as well, but in rare cases the default causes issues with mismatched line counts. In those cases try to switch lineBreaks and eat around.
Source Edit
proc readCsv(fname: string; sep: char = ','; header: string = ""; skipLines = 0; toSkip: set[char] = {}; colNames: seq[string] = @[]; skipInitialSpace = true; quote = '\"'; maxGuesses = 20; lineBreak = '\n'; eat = '\r'): DataFrame {....raises: [IOError, ValueError, HttpRequestError, Exception, LibraryError, OSError, SslError, TimeoutError, ProtocolError, KeyError], tags: [RootEffect, ReadIOEffect, WriteIOEffect, TimeEffect].}: Reads a DF from a CSV file or a web URL using the separator character sep.

fname can be a local filename or a web URL. If fname starts with "http://" or "https://" the file contents will be read from the selected web server. No caching is performed so if you plan to read from the same URL multiple times it might be best to download the file manually instead. Please note that to download files from https URLs you must compile with the -d:ssl option.

toSkip can be used to skip optional characters that may be present in the data. For instance if a CSV file is separated by ,, but contains additional whitespace (5, 10, 8 instead of 5,10,8) this can be parsed correctly by setting toSkip = {' '}.

header designates the symbol that defines the header of the CSV file. By default it's empty meaning that the first line will be treated as the header. If a header is given, e.g. "#", this means we will determine the column names from the first line (which has to start with #) and skip every line until the first line starting without #.

skipLines is used to skip N number of lines at the beginning of the file.

colNames can be used to overwrite (or supply if none in file!) names of the columns in the header. This is also useful if the header is not conforming to the separator of the file. Note: if you do supply custom column names, but there is a header in the file, make sure to use skipLines to skip that header, as we will not try to parse any header information if colNames is supplied.

maxGuesses is the maximum number of rows to look at before we give up trying to determine the datatype of the column and set it to 'object'.

lineBreak is the character used to detect if a new line starts. eat on the other hand is simply ignore. For unix style line endings the defaults are fine. In principle for windows style endings \r\n the defaults should work as well, but in rare cases the default causes issues with mismatched line counts. In those cases try to switch lineBreaks and eat around.
Source Edit
proc readCsv(s: Stream; sep = ','; header = ""; skipLines = 0; colNames: seq[string] = @[]; fname = "<unknown>"): OrderedTable[ string, seq[string]] {....raises: [IOError, OSError, CsvError, KeyError, Exception], tags: [ReadIOEffect, WriteIOEffect].}: returns a Stream with CSV like data as a table of header keys vs. seq[string] values, where idx 0 corresponds to the first data value The header field can be used to designate the symbol used to differentiate the header. By default #. colNames can be used to provide custom names for the columns. If any are given and a header is present with a character indiciating the header, it is automatically skipped. However, if custom names are desired and there is a real header without any starting symbol (i.e. header.len == 0), please use skipLines = N to skip it manually! Source Edit
proc readCsvAlt(fname: string; sep = ','; header = ""; skipLines = 0; colNames: seq[string] = @[]): OrderedTable[string, seq[string]] {. ...raises: [IOError, OSError, CsvError, KeyError, Exception], tags: [ReadIOEffect, WriteIOEffect].}: returns a CSV file as a table of header keys vs. seq[string] values, where idx 0 corresponds to the first data value The header field can be used to designate the symbol used to differentiate the header. By default #. colNames can be used to provide custom names for the columns. If any are given and a header is present with a character indiciating the header, it is automatically skipped. However, if custom names are desired and there is a real header without any starting symbol (i.e. header.len == 0), please use skipLines = N to skip it manually! Source Edit
proc showBrowser(df: DataFrame; fname = "df.html"; path = getTempDir(); toRemove = false) {....raises: [ValueError, KeyError, IOError, OSError, Exception], tags: [WriteIOEffect, ExecIOEffect, ReadEnvEffect, RootEffect, TimeEffect, WriteDirEffect].}: Displays the given DataFrame as a table in the default browser.

Note: the HTML generation is not written for speed at this time. For very large dataframes expect bad performance.
Source Edit
proc writeCsv(df: DataFrame; filename: string; sep = ','; header = ""; precision = 4) {....raises: [ValueError, KeyError, IOError], tags: [WriteIOEffect].}: writes a DataFrame to a "CSV" (separator can be changed) file. sep is the actual separator to be used. header indicates a potential symbol marking the header line, e.g. # Source Edit

datamancer/io

Imports

Procs