Bharath M R' blog

Test

GSoC Last Week

This happens to be the last week of GSoC. The major things that I accomplished this week are

  • Got pylab to work interactively.
  • Made more changes to the documentation of plotting module.

I have a pull request for the restructured plotting module at here. There has been lots of discussions on how the new plot API should look like in the pull request. The API as of now has 5 functions:

  • plot_line which plots 2D line plots, which I think I will change to plot.
  • plot_parametric which plots 2D parametric plots.
  • plot3D which plots 3D plots.
  • plot3D_parametric which plots 3D parametric line plots. I think I will have to change it into plot_parametric3D.
  • plot3D_surface which plots 3D parametric surfaces.

The names are slightly confusing, but the alternative to these names are big. If you have any good names for 3D plots, please leave it in the comments.

I will have another post describing the things I learnt over this GSoC period.

GSOC Week 11

I got my adaptive sampling branch merged last week. Now the plots are sampled adaptively and is more accurate. I also added a lot of tests to the implicit plotting branch and the coverage now is greater than 90%.

One of the major things decided in the previous week was to restructure the plot function. Presently plot is a single function, which depending on its input, renders an 2d or an 3d plot. Though it plots the right kind of plot, the plot function is quite complex and it was decided to split the plot function into smaller functions that plots a particular type of plot. I tried an approach where all 2D plots are plotted by a plot2d function, the 3D plots by plot3D and the existing plot_implicit plots to plot regions and implicit equations. Aaron mentioned that the API is still very complex as I was using tuples and lists to differentiate between a parametric plot and a 2d line plot and he was right. It is a bit complex and it was decided to have a functions for each kind of plot.

I think i can have the new plot functions as an PR by next week and I would like to try getting a Mayavi backend ready by the end of my GSoC period.

I forgot to mention why I deviated from my what I said I would do in my GSoC application. I tried getting a svgfig backend ready for almost one and a half week, and it was quite difficult. svgfig is not being updated and I had a hard time getting the axis ticks labelling to work most of the time. I wrote to the project maintainer many times and he helped me with a lot of things, but the library was not polished enough to use it with Sympy Live. So plotting for SymPy Live should be attempted with a javascript plotting library rather than a python backend. If we get matplotlib support on GAE, then it would be awesome.

Gsoc Week 9

This has been a really unproductive week. I was sick with fever for almost three days and could not spend my time on anything. I spent the next days getting the basic svgfig backend for 2d line plots. There are lots of issues with svgfig, and hence I am of the opinion svgfig should be used only for displaying images on the google app engine ie sympy live. First on the list is no support for 3-D graphs. I think this is ok, because there are not many libraries even in javascript which can do 3D plotting. Also, I am having problems with implementing contour plots and surface plots in svgfig. I am experimenting with a way, which would involve using marching squares algorithm to plot contour plots.

I think I am a little behind my gsoc schedule, and I should speed up things a little in the next few weeks.

So these are the things that I have to address

  • Integration of svgfig with sympy live
  • Fix the multiple spawning of windows in matplotlib issue.
  • Fix the plot tests. As of now, the tests do nothing, as the process_series is not called if show is set to False.
  • I have been toying around with ipython to get isympy notebook and qtconsole working. The problem I am facing is, there are 2 instances of qtconsole created, instead of one, when I run it. I will have to figure out the problem.
  • Address the issues regarding the adaptive sampling of 2d plots.
  • Clean up my branch of implicit plotting (This is almost done).
  • Split the plot function into plot, plot3d, implicit_plot functions.

I don’t think I will be able to do all of these by the end of gsoc period. But my priority will be getting the implicit plotting and svgfig backend working and getting my pull requests merged.

GSoC Week 7

This week has been quite eventful. The implicit plotting module is almost done. I added the functionality of combining expressions using the And and Or statements. Now you can do

1
plot_implicit(And(Eq(y, exp(x)), y - x > 2)

and get a plot as below. So now you can combine any number of equations / inequalities and plot it. I think its possible to do a lot of cool stuff combining equations / inequalities.

Plotting through interval math is awesome but is also very limited. You cannot add support to re(), to functions that you cannot characterize as monotonic in certain regions. But we always encounter such functions. So there should be some fall back algorithm to plot such plots. I implemented the fall back algorithm last week. The idea is borrowed from sage implicit plots. We convert an equation / inequality into a function which returns 1 if it satisfies and -1 if it doesn’t satisfy. So if you are plotting an equality then you plot using the contour command of matplotlib, and instruct it to plot only the zero contour. If its an inequality then plotting the region with two colors gives the plot required.

These are examples from the fallback algorithm.

Plot of $y^{2}=x^{3}-x$

The plot with interval arithmetic is more precise.

I haven’t finished with the tests. Once I finish the tests I can send a pull request. The pull request will be pretty big, but most of the things have been reviewed in my previous pull request. This is just an extension of the previous pull request.

There are certain problems with the module though. The line width problem which I mentioned in my previous blog post, cannot be fixed. So you will have to change to the fall back method if the line width becomes large. Also the fall back algorithm cannot plot boolean combinations of equations / inequalities. So the two methods complement each other largely. So the next question would be whether we can choose one of the two intelligently. I guess the answer is No. That decision must be taken by the user. But most of the times the interval math approach works very nicely.

GSoC Week 6

I have been trying to improve the implicit plotting module during this week. But I have hit a road block. I almost ran out of ideas to solve the problem.

Description:

The implicit plotting algorithm I implemented works something like below:

1) Get x and y interval. If it satisfies the expression throughout the interval, then plot it.

2) If it does not satisfy, throw away the intervals.

3) If it partially satisfies, then recursively subdivide into four intervals, and try again.

For cases of equality, the first point never holds true due to floating point errors. So we go on eliminating regions, and after a certain depth, plot the remaining region. These are the regions where there is at least one solution. This is the reason why the plots are rasterized. But there is an inherent bigger problem here. In the cases of expressions like $x^{3}$ even if the x interval is small, the resulting interval after computation will be large. Sometimes, due to these large intervals, there might be lots of y and x intervals which satisfy because of these errors. Even if we make x interval really small, the corresponding y interval will be large, ie the line widths become large. The explanation is more of a guess rather than the right explanation.

Examples:

Plot of $x^{y}=y^{x}$ Even if I increase my depth of recursion to higher values, the thickness becomes less, but doesn’t vanish. The plot actually should have been two separate curves.

The Mac OSX’s Grapher uses a similar algorithm(A guess because they have similar rasterization) but takes care of the line widths.

If you feel you know where the problem is, please comment or email me. :)

Gsoc Week 5

This week has been mostly bug fixing and working on migrating the sympy ipython profile to sympy. I also wanted to add the functionality of ipython -c qtconsole. So it has been mostly hanging in the ipython irc, asking them lots of questions on how ipython works. I am really thankful to minrk who patiently taught me how to do most of the stuff. There are a few problems that I am facing, but I think I will have the qtconsole ready in a day.

I also submitted a pull request #1370 for my initial work on implicit plotting. Except for the bug of changing line thickness, it works pretty nicely. Please feel free to play with it and comment on the pull request if you encounter any bugs.

GSoC Week 3

I have almost finished with the basic framework of implicit plotting based on interval arithmetic. The module implements both continuity tracking and domain tracking. Hence it does not plot points which are not there in the domain of the function. The functionalities are best illustrated by plots. There are also a couple of limitations that I encountered, which I think is difficult to avoid. I will illustrate both the functionality and the problems through plots.

The above image illustrates a plot which does domain tracking and continuity tracking. It is not possible for interval arithmetic without tracking, to decide whether to draw the plots near zero. But with continuity tracking we get an accurate plot.

The above plot is that of $y = \frac{1}{\tan{\left (x \right )}}$ . It is possible to see the small discontinuity near multiples of $\pi / 2$ as $\pi / 2$ is not there in the domain of the expression.

The above plot illustrates how sqrt does not plot anything outside its domain. Even though it appears not that significant, it becomes significant when the huge expression is provided as the argument to the function.

Illustrations of more plots

Plot of $y^{2}=x^{3}-x$

The above plot took 19.26 seconds to render.

Problems

The problem with plots using interval arithmetic is that the errors increases with the length of the expression as the it takes the lowest and the uppermost bounds. It is possible to see the effect of errors in the following plot. It is possible to see the line thickens when the expression reaches a maximum or an minimum. This is due to the error creeping in. The interval becomes wide even at the smallest of the x interval.

It is better illustrated in the plot below. It is possible to see the width of the line increasing and then decreasing.

Then next problem is that of rasterization. In order to avoid rasterization I tried using the Matplotlib’s contourf function which implements the marching squares algorithm. Though it smoothens the curves, still there is fair bit of rasterization. The plot below is a zoomed version of $y=sin(x)$

Presently the plotting function supports plotting of expressions containing cos, sin, tan, exp, log, sqrt, atan. Implementing more functions is fairly easy. I should be able to finish most of the expressions that can be implemented in the next week. I will look into implementing plotting implicit equations for expressions which cannot be implemented using interval arithmetic.

GSoC Week 2

I worked on interval aithmetic using numpy this week. I have almost got the module ready. I have to integrate it with Stefan’s branch and a basic version of implicit plotting will be ready to go. I will update this blog post with plots and performance results once I integrate it with Stefan’s branch.

Adaptive Sampling for 2D Plots

This was my first week of GSoC and I spent time on experimenting with adaptive sampling. The major idea explored were what constitutes a condition for which we need not sample more to obtain an accurate plot. I started with the idea of the area of the triangle formed by the three consecutive points to be less than a tolerance value. This worked nicely but did oversampling unnecessarily. The problem with it was the area of the triangle was dependent on the distance between the points which made the condition dependent on the lengths and hence oversampled even though the line formed by the three points was almost collinear. So the obvious next idea was to check the angle formed by the three points and see whether it forms an angle near to 180 degree. There were three versions of the above algorithm implemented, out of which one was the iterative version of a recursive solution. The iterative version is here. Considering Stefan Krastanov’s suggestion, I implemented a recursive solution which samples 5 additional points between two points instead of a single point. The idea was to use numpy’s quick evaluations of an array and also arrive at the straight line condition faster. Also, this reuses most of the code written before. The code for the following can be found here. The snippet of the code is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def get_adapt_segments(self):
    f = vectorized_lambdify([self.var], self.expr)
    list_segments = []

    def sample(segment, depth):
        if depth > 5:
            list_segments.append(segment)
        else:
            new_sampling = np.linspace(*segment[:, 0], num = 5)
            new_segments = self.get_segments((new_sampling, f(new_sampling)))
            for segmentA, segmentB in zip(new_segments[:-1], new_segments[1:]):
                if not flat(segmentA, segmentB):
                    sample(segmentA, depth + 1)
                else:
                    list_segments.append(segmentA)
            #sample the last segment
            if not flat(new_segments[-2], new_segments[-1]):
                sample(new_segments[-1], depth + 1)
            else:
                list_segments.append(new_segments[-1])

    points = np.linspace(self.start, self.end, 16)
    yvalues = f(points)
    segments = self.get_segments((points, yvalues))
    for segment in segments:
        sample(segment, 0)
    return list_segments


def flat(segmentA, segmentB):
    vectorA = segmentA[0] - segmentA[1]
    vectorB = segmentB[1] - segmentB[0]
    costheta = np.dot(vectorA, vectorB) / (np.linalg.norm(vectorA) * np.linalg.norm(vectorB))
    if abs(costheta + 1) < 0.0005:
        return True
    else:
        return False

The major problem with the above approach is the way that the rightmost point / segment is handled. The rightmost segment does not have another right segment to decide whether it forms a 180 degree angle or not. Hence it is assumed straight if the previous segment and the present segment forms a straight line. Most of the time this fails to sample further for the end segment thought it should have sampled. The problem can be seen in an plot of $y = sin(x^{2})$

The last method used is symmetric and gives better results, but it is quite ugly. The branch is here.(EDIT: changed the link). It uses some amount of random sampling to avoid aliased results. The plot of $y = sin(x^{2})$ renders very accurately. Feel free to experiment with it and if there is a better method, you can comment below :).

I think I will get an non – ugly code ready by the tomorrow and wait for Stefan’s branch to get merged before submitting this method as pull request. This week has been lots of experimentation. I think I will spend the next week getting a basic version of Interval Arithmetic ready using numpy.

Region Plots With Interval Arithmetic

My GSoC project is to provide support for implicit plotting using interval arithmetic. As mpmath already has a very good interval arithmetic library, I wanted to try out how efficient the algorithm is going to be using the mpmath interval arithmetic library. I wanted to get an idea on the time required for plotting and also wanted to decide whether to write my own interval arithmetic library or use the existing mpmath library and add additional things to it. I have a basic implementation which supports only the mpmath interval arithmetic functions. The results look promising and I am guessing a separate implementation for plotting will be faster and I will be able to add features more easily.I have an image of $y > 1/x$ with the interval edges below. The image below was plotted so with a resolution of 1024x1024. It is possible to see how the intervals are subdivided more and more when it reaches the edge of a region.

It took 1.57 seconds to render this image which is decently fast. I observed that if the independent regions are less and large, then the time take for the plot to be rendered is high. I tried $cos(y) > sin(x)$ which took about 5.3 seconds to render.

I wanted to try what the maximum time it takes to render something. So I tried plotting $sin^{2}x+cos^{2}x$ less than 1. As the arithmetic is done on intervals, it is not possible for the algorithm to decide that the expression is not true throughout the interval. So it goes on subdividing more and more, until it reaches a dimension of 1 pixel. For a resolution of 512X512, it took 120 seconds to render. If there are a lot of evaluations in the expression, then it might increase, but we should be expecting times around 120 seconds.

Another problem that I have to address is rasterization. I am really not getting any ideas on how to avoid rasterization. One way is to handle the zoom event in matplotlib and change the data to match the zoom. But for complicated graphs, revaluating might take a lot of time, which is bad.

We can see that if there is a way of interpolating over the rectangular edges, then we will have a plot without rasterization. I haven’t got any foolproof idea to implement this interpolation as there will be many independent regions. So if you have any idea, then please comment or mail me :). The code for plotting can be found here.